How to create like query in Cassandra?

How to create like query in Cassandra? - nosql

In my keyspace
posts = [
#key
'post1': {
# columns and value
'url': 'foobar.com/post1',
'body': 'Currently has client support FOOBAR for the following programming languages..',
},
'post2': {
'url': 'foobar.com/post2',
'body': 'The table with the following table FOOBAR structure...',
},
# ... ,
}
How to create a like query in Cassandra to get all posts that contains the word 'FOOBAR'?
In SQL is SELECT * FROM POST WHERE BODY LIKE '%FOOBAR%', but in Cassandra?

The only way to do this efficiently is to use a full-text search engine like https://github.com/tjake/Solandra (Solr-on-cassandra). Of course you can roll your own using the same techniques manually, but usually this is not called for.
Note that this is true for SQL databases too: they will translate %FOO% to a table scan, unless you use a FTS extension like postgresql's tsearch2.

You might create another column family where the keys are the domains, and the values are the keys in your original column family. That way you could refer to records within a specific domain directly.

Cassandra 3.4 added support for LIKE in CSQL. So finally it is available natively.

Related

Search jsonb fields in postgresql with Hasura

Is it possible to do a greater than search across a jsonb field using hasura?
it looks to be possible in PostgreSQL itself, How can I do less than, greater than in JSON Postgres fields?
in postgres I'm storing a table
asset
name: string
version: int
metadata: jsonb
the metadata looks like this.
{'length': 5}
I am able to find asset that matches exactly using the _contains.
{
asset(where:{metadata : {_contains : {length: 5}}}){
name
metadata
}
}
I would like to be able to find asset with a length over 10.
I tried:
{
asset(where:{metadata : {_gt : {length: 10}}}){
name
metadata
}
}

A. Possibility to do on graphql level directly
Hasura documentation: JSONB operators (_contains, _has_key, etc.) mentions only 4 operators:
The _contains, _contained_in, _has_key, _has_keys_any and _has_keys_all operators are used to filter based on JSONB columns.
So direct answer for your question: No. It's not possible to do on graphql level in hasura.
(At least it's not possible yet. Who knows: maybe in future releases more operators will be implemented.
)
B. Using derived views
But there is another way, the one explained in https://hasura.io/blog/postgres-json-and-jsonb-type-support-on-graphql-41f586e47536/#derived-data
This recomendation is repeated in: https://github.com/hasura/graphql-engine/issues/6331
We don't have operators like that for JSONB (might be solved by something like #5211) but you can use a view or computed field to flatten the text field from the JSONB column into a new column and then do a like on that.
Recipe is:
1. Create a view
CREATE VIEW assets -- note plural here. Name view accordingly to your style guide
AS
SELECT
name,
version,
metadata,
(metadata->>'length')::int as meta_len -- cast to other number type if needed
FROM asset
2. Register this view
3. Use it in graphql queries as usual table
E.g.
query{
assets(where: {meta_len: {_gt:10}}){
name
metadata
}
C. Using SETOF-functions
1. Create SETOF-function
CREATE FUNCTION get_assets(min_length int DEFAULT 0)
RETURNS SETOF asset
LANGUAGE SQL
STABLE
AS $$
SELECT * FROM asset
WHERE
(metadata->>'length')::int > min_length;
$$;
2. Register in hasura
3. Use in queries
query{
get_assets(args: {min_length: 10}){
name
metadata
}
I think that was the last possible option.
It will not gives you full "schemaless freedom" that maybe you're looking but IDK know about other ways.

Amazon Redshift get all keys from JSON

I looked at the documentation of Amazon redshift and I'm not able to see a function which will give me what I want.
https://docs.aws.amazon.com/redshift/latest/dg/json-functions.html
I have a column in my database which contains JSON like this
{'en_IN-foo':'bla bla', 'en_US-foo':'bla bla'}
I want to extract all keys from json which have foo. So I want to extract
en_IN-foo
en_US-foo
How can I get what I want? The closest to my requirement is JSON_EXTRACT_PATH_TEXT function but that can only extract the key when you know the key name. in my case I want all keys which have a pattern but I don't know the key names.
I also tried abandoning the JSON function way and going the REGEX way. I wrote this code
select distinct regexp_substr('{en_in-foo:FOO, en_US-foo:BAR}','[^.]{5}-foo')
but this finds only the first match. I need all the matches.

Redshift is not flexible with JSON, so I don't think getting keys from an arbitrary JSON document is possible. You need to know the keys upfront.
option 1
If possible change your JSON document to have a static schema:
{"locale":"en_IN", "foo": "bla bla"}
Or even
{"locale":"en_IN", "name": "foo", "value": "bla bla"}
Option 2
I can see that your prefix may be known to you as it looks like the locale. What you could do is to create a static table of locales, and then CROSS JOIN it with your JSON column.
locales_table:
Id | locale
----------------
1 | en_US
2 | en_IN
The query would look like this:
SELECT
JSON_EXTRACT_PATH_TEXT(json_column, locale || '-foo', TRUE) as foo_at_locale
FROM json_table
CROSS JOIN locales_table
WHERE foo_at_locale IS NOT NULL

Sequelize how to use aggregate function on Postgres JSONB column

I have created one table with JSONB column as "data"
And the sample value of that column is
[{field_id:1, value:10},{field_id:2, value:"some string"}]
Now there are multiple rows like this..
What i want ?
I want to use aggregate function on "data" column such that, i should
get
Sum of all value where field_id = 1;
Avg of value where field_id = 1;
I have searched alot on google but not able to find a proper solution.
sometimes it says "Field doesn't exist" and some times it says "from clause missing"
I tried referring like data.value & also data -> value lastly data ->> value
But nothing is working.
Please let me know the solution if any one knows,
Thanks in advance.

Your attributes should be something like this, so you instruct it to run the function on a specific value:
attributes: [
[sequelize.fn('sum', sequelize.literal("data->>'value'")), 'json_sum'],
[sequelize.fn('avg', sequelize.literal("data->>'value'")), 'json_avg']
]
Then in WHERE, you reference field_id in a similar way, using literal():
where: sequelize.literal("data->>'field_id' = 1")
Your example also included a string for the value of "value" which of course won't work. But if the basic Sequelize setup works on a good set of data, you can enhance the WHERE clause to test for numeric "value" data, there are good examples here: Postgres query to check a string is a number
Hopefully this gets you close. In my experience with Sequelize + Postgres, it helps to run the program in such a way that you see what queries it creates, like in a terminal where the output is streaming. On the way to a working statement, you'll either create objects which Sequelize doesn't like, or Sequelize will create bad queries which Postgres doesn't like. If the query looks close, take it into pgAdmin for further work, then try to reproduce your adjustments in Sequelize. Good luck!

Query optimization on URL text fields

I'm coding an Domain, URL and Regex (like SquidGuard) filter for Squid using the eCAP protocol and i want store all the domains into a postgresql database. The problem is that when i do a search with like, example:
SELECT website_groups.id,
"name",
description
FROM website_domains
JOIN website_groups ON website_group_id = website_groups.id
WHERE (website_domains.domain = 'google.com'
OR website_domains.domain LIKE '%.google.com')
the query for 1'605'923 tuples has a lag of 490ms, but this is bad for all the request to squid proxy.
My question is how optimize PostgreSQL to make that query more fast, or I need to use a NoSQL Database (I test with MongoDB and make the query in 609ms with less data).
I tried with FullText search, but this has english tokenize and the data are URLs (www.google.com/query?data1=3), Domains (bing.com) and Regex (.*.cu).

You may try to create a column for the reverse domain string and create an index on it:
ALTER TABLE website_domains ADD reverse_domain VARCHAR(100);
UPDATE website_domains SET reverse_domain = REVERSE(domain);
CREATE INDEX reverse_domain_index ON
website_domains (reverse_domain varchar_pattern_ops);
varchar_pattern_ops allows LIKE to use this index if possible.
The prefix search is done by reversing the pattern as well:
... OR website_domains.reverse_domain LIKE REVERSE('%.google.com')
You can probably avoid the extra column with a computed index
CREATE INDEX reverse_domain_index ON
website_domains (REVERSE(domain) varchar_pattern_ops);
and the following clause:
.. OR REVERSE(website_domains.domain) LIKE REVERSE('%.google.com')
But you should try.

Combine SQL with an external search

In my case SQL for structured data and considering Lucene for text search.
Yes MSSQL has FullText but Lucene offers some stuff I want.
For the purpose of the question any external search.
In SQL there is a main table with a PK.
In SQL there are a number queries that use the main table and number of other tables.
From the external search I will get list of Main.PK to filter by.
That list could be from 1 to to 1 million.
The external search is the most expensive part of the search. The SQL part is very efficient. Passing the SQL PK to the external is not really a good option as I need various data from the SQL query. The only thing coming back from Lucene is the PK (term) and some times the score.
Is there a best practice?
Options I see are
where Main.PK in (PK values from external search)
populate the external search PK values in a #TEMP and join to that
since some times I need the score this seems best as I can put the
score in the #temp
In an ideal world there would be a join like this:
join exeternalvirtualtable as evt
on evt.PK = Main.PK
and syntax specific to the external search
I get that is asking a lot but is there anything like that in general?
Is there a syntax/API to make an external search look like a table (or view) to MSSQL?
Is there anything like that for MSSQL to Lucene?
This is kind of a start OLE DB Providers and OPENROWSET
Ideally a .NET Framework Data Providers for Lucene that mapped some SQL syntax to Lucene.
The app is .NET in case there is a .NET specific solution.
The product RavenDB combines a structures and unstructured (Lucene) search very fast even if the Lucene return a lot of row so there has to be a way to do this short of putting PK in a #temp.

Is there a syntax/API to make an external search look like a table (or view) to MSSQL?
You can use IndexSearcher class of Lucene, it will give you a TopDocs object that contain the relevant documents (PKs in your case). Then you can populate a SQL table based on this result.
You will need something like this:
TopDocs topDocs = searcher.search(query, MAX_HITS);
for (int i = 0; i < topDocs.scoreDocs.length; i++) {
Document doc = searcher.doc(topDocs.scoreDocs[i].doc);
String pk = doc.get("PK");
// Connection to database and executing insertion
}