sphinxQL fetching random? - sphinx

In SphinxQL, how do I get random records from index?
SELECT fname FROM indexname WHERE Age>=21 and Age<=47 random 0,4 \G;
Getting following error:
sphinxql: syntax error, unexpected CONST_INT, expecting BETWEEN (or 8 other tokens) near '0,4
Any other way to getting random record from SphinxQl?

All you need is
... ORDER BY RAND() LIMIT 4

Related

Syntax error when trying to populate column with count of unique values in another column

I'm trying to count the number of unique pool operators for every permit # in a table but am having trouble putting this value in a new column dedicated to that count.
So I have 2 tables: doh_analysis; doh_pools.
Both of these tables have a "permit" column (TEXT), but doh_analysis has about 1000 rows with duplicates in the permit column but occasional unique values in the operator column (TEXT).
I'm trying to fill a column "operator_count" in the table "doh_pools" with a count of unique values in "pooloperator" for each permit #.
So I tried the following code but am getting a syntax error at or near "(":
update doh_pools
set operator_count = select count(distinct doh_analysis.pooloperator)
from doh_analysis
where doh_analysis.permit ilike doh_pools.permit;
When I remove the "select" from before the "count" I get "SQL Error [42803]: ERROR: aggregate functions are not allowed in UPDATE".
I can successfully query a list of distinct permit-pooloperator pairs using:
select distinct permit, pooloperator
from doh_analysis;
And I can query the # of unique pooloperators per permit 1 at a time using:
select count(distinct pooloperator)
from doh_analysis
where permit ilike '52-60-03054';
But I'm struggling to insert a count of unique pairs for each permit # in the operatorcount column.
Is there a way to do this?
There is certainly a better way of doing this but I accomplished my goal by creating 2 intermediary tables and the updating the target table with values from the 2nd intermediate table like so:
select distinct permit, pooloperator
into doh_pairs
from doh_analysis;
select permit, count(distinct pooloperator)
into doh_temp
from doh_pairs
group by permit;
select count(distinct permit)
from doh_temp;
update doh_pools
set operator_count = doh_temp.count
from doh_temp
where doh_pools.permit ilike doh_temp.permit
and doh_pools.permit is not NULL
returning count;

Postgresql (Aurora) Sum from jsonb gives odd error

I'm trying to sum values from a jsonb type column in a table in an Aurora/Postgres database but it doesn't seem to work.
select (payload->>'loanAmount')::int from rfqs limit 1;
Gives a results of 10000 (int4).
select sum((payload->>'loanAmount')::int) from rfqs limit 1;
Gives a result of: ERROR: invalid input syntax for integer: "2000.5"
It seems like this is something to do with the way the ->> operator converts the json to a string, but it's like something is wrong with that string which prevents it from being correctly typecast to an int.
As a test I did select SUM(('10000'::int)); which worked fine and returned 10000 as expected.
Any ideas?
This will allow you to understand (you will see what the problem is with "::int")
select sum(payload->>'loanAmount') from rfqs
which is the same as:
select sum(payload->>'loanAmount') from rfqs limit 1
(An aggregate without group by returns only on row, so "limit 1" is a bit superfluous)
Try
SELECT sum(to_number((payload->>'loanAmount'),'999999999D9999')) from rfqs
see http://www.sqlfiddle.com/#!17/9c30a/8
Some of your "loanAmount" properties does not have integer value. First record does though.
To find offenting records:
SELECT payload FROM rfqs WHERE (payload->>'loanAmount') <> trunc(payload->>'loanAmount')

How to group by similar values with pg_trgm

I have the following table
id error
- ----------------------------------------
1 Error 1234eee5, can not write to disk
2 Error 83457qwe, can not write to disk
3 Error 72344ee, can not write to disk
4 Fatal barier breach on object 72fgsff
5 Fatal barier breach on object 7fasdfa
6 Fatal barier breach on object 73456xcc5
I want to be able to get a result that counts by similarity, where similarity of > 80% means two errors are equal. I've been using pg_trgm extension, and its similarity function works perfectly for me, the only thing I can figure out how to produce the grouping result below.
Error Count
------------------------------------- ------
Error 1234eee5, can not write to disk, 3
Fatal barier breach on object 72fgsff, 3
Basically you could join a table with itself to find similar strings, however this approach will end in a terribly slow query on a larger dataset. Also, using similarity() may cause inaccuracy in some cases (you need to find the appropriate limit value).
You should try to find patterns. For example, if all variable words in strings begin with a digit, you can mask them using regexp_replace():
select id, regexp_replace(error, '\d\w+', 'xxxxx') as error
from errors;
id | error
----+-------------------------------------
1 | Error xxxxx, can not write to disk
2 | Error xxxxx, can not write to disk
3 | Error xxxxx, can not write to disk
4 | Fatal barier breach on object xxxxx
5 | Fatal barier breach on object xxxxx
6 | Fatal barier breach on object xxxxx
(6 rows)
so you can easily group the data by error message:
select regexp_replace(error, '\d\w+', 'xxxxx') as error, count(*)
from errors
group by 1;
error | count
-------------------------------------+-------
Error xxxxx, can not write to disk | 3
Fatal barier breach on object xxxxx | 3
(2 rows)
The above query is only an example as the specific solution depends on the data format.
Using pg_trgm
The solution based on the OP's idea (see the comments below). The limit 0.8 for similarity() is certainly too high. It seems that it should be somewhere about 0.6.
The table for unique errors (I've used a temporary table but it also be a regular one of course):
create temp table if not exists unique_errors(
id serial primary key,
error text,
ids int[]);
The ids column is to store id of rows of the base table which contain similar errors.
do $$
declare
e record;
found_id int;
begin
truncate unique_errors;
for e in select * from errors loop
select min(id)
into found_id
from unique_errors u
where similarity(u.error, e.error) > 0.6;
if found_id is not null then
update unique_errors
set ids = ids || e.id
where id = found_id;
else
insert into unique_errors (error, ids)
values (e.error, array[e.id]);
end if;
end loop;
end $$;
The final results:
select *, cardinality(ids) as count
from unique_errors;
id | error | ids | count
----+---------------------------------------+---------+-------
1 | Error 1234eee5, can not write to disk | {1,2,3} | 3
2 | Fatal barier breach on object 72fgsff | {4,5,6} | 3
(2 rows)
For this particular case you could just group by left(error, 5), which would lead to two groups, one containing all the strings starting with Error, the other group containing all the strings starting with Fatal. This criteria would have to be updated if you are planning to add more error types.

Issue with using percentile_cont function in Postgresql

This is my table
ID Total
1 2019.21
3 87918.32
2 562900.3
3 982688.98
1 56788.34
2 56792.32
3 909728.23
Now I would like to find the 25th,50th,75th,90th and 100th percentile of the values (Total) in the above Table. Assume my table consists of Whole Lot of data (some 2 Million Records of the same format) . I've Used the Following code :
CODE :
SELECT percentile_disc(0.5) WITHIN GROUP (ORDER BY Total) as disc_func
FROM my_table
The Error I've come across :
ERROR: syntax error at or near "("
LINE 3: percentile_disc(0.5) WITHIN GROUP (ORDER BY total...
You use PostgreSQL < 9.4 . It does not support WITHIN GROUP
https://www.postgresql.org/docs/9.4/static/functions-aggregate.html
https://www.postgresql.org/docs/9.3/static/functions-aggregate.html

strange result when use Where filter in CQL cassandra

i have a column family use counter as create table command below: (KEY i use bigin to filter when query ).
CREATE TABLE BannerCount (
KEY bigint PRIMARY KEY
) WITH
comment='' AND
comparator=text AND
read_repair_chance=0.100000 AND
gc_grace_seconds=864000 AND
default_validation=counter AND
min_compaction_threshold=4 AND
max_compaction_threshold=32 AND
replicate_on_write='true' AND
compaction_strategy_class='SizeTieredCompactionStrategy' AND
compression_parameters:sstable_compression='SnappyCompressor';
But when i insert data to this column family , and select using Where command to filter data
results i retrived very strange :( like that:
use Query:
select count(1) From BannerCount where KEY > -1
count
-------
71
use Query:
select count(1) From BannerCount where KEY > 0;
count
-------
3
use Query:
select count(1) From BannerCount ;
count
-------
122
What happen with my query , who any tell me why i get that :( :(
To understand the reason for this, you should understand Cassandra's data model. You're probably using RandomPartitioner here, so each of these KEY values in your table are being hashed to token values, so they get stored in a distributed way around your ring.
So finding all rows whose key has a higher value than X isn't the sort of query Cassandra is optimized for. You should probably be keying your rows on some other value, and then using either wide rows for your bigint values (since columns are sorted) or put them in a second column, and create an index on it.
To explain in a little more detail why your results seem strange: CQL 2 implicitly turns "KEY >= X" into "token(KEY) >= token(X)", so that a querier can iterate through all the rows in a somewhat-efficient way. So really, you're finding all the rows whose hash is greater than the hash of X. See CASSANDRA-3771 for how that confusion is being resolved in CQL 3. That said, the proper fix for you is to structure your data according to the queries you expect to be running on it.