Search two keyword in full-text field - sphinx

I need to search two keyword (between)
Eg 1: I am going to buy Mac Air and Mac Mini for my brother. Here i need to get all data with has the keyword Mac Air and Mac Mini.
Eg 2: I will buy a car cost between $5000 to $10000 . Here i want to search for between.
Eg 3.1: I have some second hand goods ( x, y, z ) x cost $300 y cost $560 z cost $50.If any one need to buy this, Please contact 1800345123123.
Eg 3.2: I have some second hand goods ( a, b, c ) a cost $1300 b cost $660* c cost $50.If any one need to buy this, Please contact 1800345123123.
In this example i need to find the minimum and max range, Say i need to find from $100 to $600. i should get Eg 3.1 but not Eg 3.2.
How can i do all this in Sphinx ?

You need to provide filter for each search like search between range for car cost, simple search for Mac Air and Mac mini, separate filter for second hand goods.This can not be accomplished in single search query.
For price range, you can use following
$cl->SetFilterRange ( $attribute, $min, $max, $exclude=false )
For Mac Air and Mac mini:
$cl->SetMatchMode ( SPH_MATCH_ANY );
When visitor will enter Mac he will get all results
For other two examples, need more details on database schema.

Related

Determine count of locations within certain distance of each other

Good Morning/Afternoon,
Fairly new at SQL in general, but I've been able to put together a few queries that give me the output I need for a certain business case, but I have a new requirement that I am struggling with.
In a nutshell, I am extracting a list of "sites" (physical locations) for a given customer along with the lat/lon, and then feeding each location to Google Maps to plot a point for each site. What I am trying to do is query for unique sites. If there are multiple sites that are within 1/2 mile of each other, they are lumped as a single site. For example, if a customer has 10 sites within 1/2 mile of each other, they technically have 1 site, not 10.
Here is an example of what I am doing:
select c.id, i.site_id, s.name, max(i.captured_at), s.center_lat, s.center_lng, CONCAT(s.center_lat, ',' , s.center_lng) AS LOCATION
from images i
inner join sites s on s.id = i.site_id
inner join customers c on c.id = s.customer_id
where i.hidden = 'false' and i.copied_from_id is null and i.status = 'complete' and c.id = '353'
group by c.id, i.site_id, s.name, s.center_lat, s.center_lng
order by max DESC
Here is an example of the output:
As it stands now, it returns a count of 4 sites (I am rendering the results in Google Data Studio displaying the count of records returned), which works fine for another scenario. However, since these sites are within 1/2 mile of each other, they are technically 1 site, not 4. I am trying to determine how to come up with a count of 1 site vs. 4 in this scenario. If there was another entry where the lat/lon (location) was more than 1/2 mile away, I would be looking for a count of 2 sites. I hope this all makes sense.
Currently trying to research where to start so if there are any references, or a push in the right direction, that would be awesome. Thanks very much.

Using theta sketch to count ad impressions and unqiue users

We're currently serving 250 Billion ad impressions per day across our 6 data centers. Out of these, we are serving about 180 Billion ad impressions in the US alone.
Each ad impression can have hundreds of attributes(dimensions) e.g Country, City, Brower, OS, Custom-Parameters from web-page, ad-size, ad-id, site-id etc
Currently, we don't have a data warehouse and ad-hoc OLAP support is pretty much non-existent in our organization. This severely limits our ability to run adhoc queries and get a quick grasp about data.
We want to answer the following 2 queries to begin with :-
Q1) Find the total count of ad impressions which were served from "beginDate" to "endDate" where Dimension1 = d1 and Dimension2 = d2 .... .. Dimensionk = d_k
Q2) Find the total count of unique users which saw our ads from "beginDate" to "endDate" where Dimension1 = d1 and/or Dimension2 = d2 .... .. Dimensionk = d_k
As I said each impression can have hundreds of dimensions(listed above) and cardinality of each dimension could be from few hundreds(say for dimension Country) to Billions(for e.g User-id).
We want approximate answers and the least infrastructure cost and query response time within < 5 minutes. I am thinking about using Druid and Apache datasketches(Theta Sketch to be precise) for answering Q2 and using the following data-model :-
Date Dimension Name Dimension Value Unique-User-ID(Theta sketch)
2021/09/12 "Country" "US" 37873-3udif-83748-2973483
2021/09/12 "Browser" "Chrome" 37873-3aeuf-83748-2973483
.
.
<Other records>
So after roll-up, I would end up with 1 theta-sketch per dimension value per day(assuming day level granularity) and I can do unions and intersections on these sketches to answer Q2)
I am planning to set k(nominal entries) to 10^5(please comment about what would be suitable k for this use case and expected storage amount required?)
I've also read that the about theta sketch set ops accuracy here
I would like to know if there is a better approach to solve Q2(with or without Druid)
Also I would like to know how can I solve Q1?
If I replace Unique-User-Id with "Impression-Id", can I use the same data model to answer Q1? I believe that if I replace Unique-User-Id with "Impression-Id" then accuracy to count the total impressions would be way worse than that of Q2, because each ad-impression is assigned a unique id and we are currently serving 250 Billion per day.
Please share your thoughts about solving Q1 and Q2.
Regards
kartik

MySQL Workbench - script storing return in array and performing calculations?

Firstly, this is part of my college homework.
Now that's out of the way: I need to write a query that will get the number of free apps in a DB as a percentage of the total number of apps, sorted by what category the app is in.
I can get the number of free apps and also the number of total apps by category. Now I need to find the percentage, this is where it goes a bit pear-shaped.
Here is what I have so far:
-- find total number of apps per category
select #totalAppsPerCategory := count(*), category_primary
from apps
group by category_primary;
-- find number of free apps per category
select #freeAppsPerCategory:= count(*), category_primary
from apps
where (price = 0.0)
group by category_primary;
-- find percentage of free apps per category
set #totals = #freeAppsPerCategory / #totalAppsPercategory * 100;
select #totals, category_primary
from apps
group by category_primary;
It then lists the categories but the percentage listed in each category is the exact same value.
I had initially thought to use an array, but from what I have read mySql does not seem to support arrays.
I'm a bit lost of how to proceed from here.
Finally figured it out. Since I had been saving the previous results in variables it seems that it was not able to calculate on a row by row basis, which is why all the percentages were identical, it was an average. So the calculation needed to be part of the query.
Here's what I came up with:
SELECT DISTINCT
category_primary,
CONCAT(FORMAT(COUNT(CASE
WHEN price = 0 THEN 1
END) / COUNT(*) * 100,
1),
'%') AS FreeAppSharePercent
FROM
apps
GROUP BY category_primary
ORDER BY FreeAppSharePercent DESC;
Then the query result is:

Tableau calculated field-formula

I am trying to create a simple formula for calculation in table .
This is the table structure
EXPERIENCE PLATFORM REVENUE
PC WEBSITE 100
MOBILE ANDROID 20
MOBILE IPAD 10
MOBILE IPHONE 20
My calculation is trying to find the share :
Calc1: Share of site= sum of revenue from mobile / sum of revenue (mobile+PC)
Calc2: share of platform= sum of revenue from apps/revenue from mobile
Can someone highlight how I can create a formula , I am very new to tableau.
Thanks
For Calc1, try a calculated field like:
sum(iif([Experience]='MOBILE',[Revenue],0))/sum([Revenue])
Calc2 uses similar logic:
sum(iif([Platform]='ANDROID',[Revenue],0))/sum(iif([Experience]='MOBILE',[Revenue],0))
You should also be able to get what you need using the Tableau built in table calculations, by dragging in what you need and adjusting filters as necessary.

SQL Query sort by closest match

We have a Locations search page that is giving us a challenge I've never run across before.
In our database, we have a list of cities, states, etc. with the corresponding geocodes. All was working fun until now...
We have two locations in a city named "Black River Falls, WI" and we've recently opened one in "River Falls, WI".
So our table has records as follows:
Location City State
-------------------------------------
1 Black River Falls WI
2 Black River Falls WI
3 River Falls WI
Obviously our query uses a "LIKE" clause to match city, but when a customer searches the text "River Falls", in the search results, the first results shown are always "Black River Falls".
In our application, we always use the first match, and use it as the default. (We could change it, but it would be a lot of un-budgeted work)
I know I could simple change the sort order to have "River Falls" come up first, but that's a sloppy solution that works only in this one case.
What I'm wondering is if there is a way, through T-SQL (SQL Server 2008r2) to sort by "best match" where "River Falls" would "win" if we search for "River Falls, WI" and "Black River Falls" would work if we search for "Black River Falls" WI.
You can use the "DIFFERENCE" function to search using the closest SOUNDEX match.
Select * From Locations WHERE City=#City ORDER BY Difference(City, #City) DESC
From the MSDN Documentation:
The integer returned is the number of characters in the SOUNDEX values
that are the same. The return value ranges from 0 through 4: 0
indicates weak or no similarity, and 4 indicates strong similarity or
the same values.
DIFFERENCE and SOUNDEX are collation sensitive.
Like this:
;WITH cte As
(
SELECT *
, ROW_NUMBER() OVER(ORDER BY LEN(City)-LEN(#UserText)) As MatchPrio
FROM Cities
WHERE City LIKE '%'+#UserText+'%'
)
SELECT *
FROM cte
WHERE MatchPrio = 1
Update:
You can change the ORDER BY expression above to also use DIFFERENCE(..) or any other combination of criteria.