Consider the following setup:
CREATE TABLE person (
first_name varchar,
last_name varchar,
age INT
);
INSERT INTO person (first_name, last_name, age)
VALUES ('pete', 'peterson', 16),
('john', 'johnson', 20),
('dick', 'dickson', 42),
('rob', 'robson', 30);
Create OR REPLACE VIEW adult_view AS
SELECT
first_name || ' ' || last_name as full_name,
age
FROM person;
If I run:
SELECT *
FROM adult_view
WHERE age > 18;
Will the view generate the full_name column for pete even though he gets filtered out?
Similarly, if I run:
SELECT age
FROM adult_view;
Will the view generate any full_name columns at all?
Concerning your first query:
EXPLAIN (VERBOSE, COSTS OFF)
SELECT full_name
FROM adult_view
WHERE age > 18;
QUERY PLAN
══════════════════════════════════════════════════════════════════════════════════
Seq Scan on laurenz.person
Output: (((person.first_name)::text || ' '::text) || (person.last_name)::text)
Filter: (person.age > 18)
Query Identifier: 5675263059379476127
So the table is scanned, people under 19 are filtered out, and then the result is calculated. So full_name won't be computed for Pete.
Concerning your second query:
EXPLAIN (VERBOSE, COSTS OFF)
SELECT age
FROM adult_view
WHERE age > 18;
QUERY PLAN
════════════════════════════════════════
Seq Scan on laurenz.person
Output: person.age
Filter: (person.age > 18)
Query Identifier: -8981994317495194105
(4 rows)
full_name is not calculated at all.
Related
I have a table with approximately 7 million records. The table has a first_name and last_name column which I want to search on using the levenshtein() distance function.
select levenshtein('JOHN', first_name) as fn_distance,
levenshtein('DOE', last_name) as ln_distance,
id,
first_name as "firstName",
last_name as "lastName"
from person
where first_name is not null
and last_name is not null
and levenshtein('JOHN', first_name) <= 2
and levenshtein('DOE', last_name) <= 2
order by 1, 2
limit 50;
The above search is slow (4 - 5 secs), what can I do to improve performance? Should a create indexes on the two columns, or something else?
After I added indexes below:
create index first_name_idx on person using gin (first_name gin_trgm_ops);
create index last_name_idx on person using gin(last_name gin_trgm_ops);
The query now takes ~11 secs. :(
New query:
select similarity('JOHN', first_name) as fnsimilarity,
similarity('DOW', last_name) as lnsimilarity,
first_name as "firstName",
last_name as "lastName",
npi
from person
where first_name is not null
and last_name is not null
and similarity('JOHN', first_name) >= 0.2
and similarity('DOW', last_name) >= 0.2
order by 1 desc, 2 desc, npi
limit 50;
There is no built-in index type that supports levenshtein distances. I'm not aware of any 3rd party index implementations to do so either.
Another string similarity measure, trigram similarity, does have an index method to support it. Maybe you can switch to using that measure instead.
You need to write the query using the % operator, not the similarity function. So it would look something like this:
set pg_trgm.similarity_threshold TO 0.2;
select similarity('JOHN', first_name) as fnsimilarity,
similarity('DOW', last_name) as lnsimilarity,
first_name as "firstName",
last_name as "lastName",
npi
from person
where first_name is not null
and last_name is not null
and 'JOHN' % first_name
and 'DOW' % last_name
order by 1, 2, npi
limit 50;
But note that 0.2 is very low cutoff, and the lower the cutoff the less efficient the index.
I have one table with 100M plus rows which looks like this
Create table member (
id bigint,
gender text,
//..other fields
primary key (id)
);
Now the gender field has two possible value 'M' or 'F'
Whenever I am using the gender field then it's taking to much time I have indexes on other fields like id, member details, mobile number
select
count(1) filter (where mod.is_active and m.gender = 'M') as male,
count(1) filter (where mod.is_active and m.gender = 'F') as female
from member_other_details mod
inner join member m on m.id = mod.member_id
This query is taking hrs to complete
How can I optimize this?
Personnally i would execute this query
select m.gender,count(*)
from member_other_details mod inner join member m on m.id = mod.member_id
where mod.is_active
group by m.gender
I need to create a data structure like this:
Table 1
Code, Value, Offer_ID
I am creating a service that, for a given combination of "Code" and "Value", must return an Offer_ID that I preconfigured.
For example:
Code Value Offer_ID
------ ------- ----------
Age 30 OFF1
Age 30 OFF2
Province RM OFF2
Age 40 OFF3
Province TO OFF3
Age 40 OFF4
Province TO OFF4
Operator TIM OFF4
The calling service always calls me passing the Age, Province and operator values.
I have to look in this table if I find a specific Offer_ID for the three values together (as OFF4), or for 2 (as OFF3) or for Age which is the only mandatory (OFF1).
So if the client passes me Province BO and operator WIND I have to return OFF1
How can I do ? How can I structure the tables and the query?
I hope I was able to expose the problem ...
Thanks 1000 to those who help me ... we are going crazy ... !!!
Try this:
with tab (age, province, operator, offer_id) as (values
(30, null, null, 'OFF1')
, (30, 'RM', null, 'OFF2')
, (40, 'TO', null, 'OFF3')
, (40, 'TO', 'TIM', 'OFF4')
)
, op_inp (age, province, operator) as (values
--(40, 'TO', 'TIM') --'OFF4'
(40, 'TO', 'VODAFONE') --'OFF3'
--(30, 'RM', 'VODAFONE') --'OFF2'
--(30, 'TO', 'VODAFONE') --'OFF1'
)
select offer_id /*Just for info*/, order_flag
from
(
select t.*, 3 as order_flag
from tab t
join op_inp o on o.age=t.age and o.province=t.province and o.operator=t.operator
union all
select t.*, 2 as order_flag
from tab t
join op_inp o on o.age=t.age and o.province=t.province --and t.operator is null
union all
select t.*, 1 as order_flag
from tab t
join op_inp o on o.age=t.age --and t.province is null and t.operator is null
)
order by order_flag desc
fetch first 1 row only
;
I have a table TaggedData with the following fields and data
ID GroupID Tag MyData
** ******* *** ******
1 Texas AA01 Peanut Butter
2 Texas AA15 Cereal
3 Ohio AA05 Potato Chips
4 Texas AA08 Bread
I have a second table of BlockedTags as follows:
ID StartTag EndTag
** ******** ******
1 AA00 AA04
2 AA15 AA15
How do I select from this to return all data matching a given GroupId but NOT in any blocked range (inclusive)? For the data given if the GroupId is Texas, I don't want to return Cereal because it matches the second range. It should only return Bread.
I did try left joins based queries but I'm not even that close.
Thanks
create table TaggedData (
ID int,
GroupID varchar(16),
Tag char(4),
MyData varchar(50))
create table BlockedTags (
ID int,
StartTag char(4),
EndTag char(4)
)
insert into TaggedData(ID, GroupID, Tag, MyData)
values (1, 'Texas', 'AA01', 'Peanut Butter')
insert into TaggedData(ID, GroupID, Tag, MyData)
values (2, 'Texas' , 'AA15', 'Cereal')
insert into TaggedData(ID, GroupID, Tag, MyData)
values (3, 'Ohio ', 'AA05', 'Potato Chips')
insert into TaggedData(ID, GroupID, Tag, MyData)
values (4, 'Texas', 'AA08', 'Bread')
insert into BlockedTags(ID, StartTag, EndTag)
values (1, 'AA00', 'AA04')
insert into BlockedTags(ID, StartTag, EndTag)
values (2, 'AA15', 'AA15')
select t.* from TaggedData t
left join BlockedTags b on t.Tag between b.StartTag and b.EndTag
where b.ID is null
Returns:
ID GroupID Tag MyData
----------- ---------------- ---- --------------------------------------------------
3 Ohio AA05 Potato Chips
4 Texas AA08 Bread
(2 row(s) affected)
So, to match on given GroupID you change the query like that:
select t.* from TaggedData t
left join BlockedTags b on t.Tag between b.StartTag and b.EndTag
where b.ID is null and t.GroupID=#GivenGroupID
I Prefer the NOT EXISTS simply because it gives you more readability, usability and better performance usually in large data (several cases get better execution plans):
would be like this:
SELECT * from TaggedData
WHERE GroupID=#GivenGroupID
AND NOT EXISTS(SELECT 1 FROM BlockedTags WHERE Tag BETWEEN StartTag ANDEndTag)
I'm new to oracle. I have to get firstname and second maximum salary of the record from the table using sub-queries.
I've tried below query:
select max(salary)
from employees
where salary > (select max(salary)
from empoloyees);
this query used to get second max salary from the table. Now I have to get firstname of the second salary record.
firstname salary
-------------------
mani 45666
vijay 50000
sanjay 65000
SELECT firstname, salary FROM
(SELECT * FROM employees ORDER BY salary DESC)
WHERE rownum = 2;
The inner SELECT sorts the table by salary, in order from greatest to least (hence DESC).
The outer SELECT takes the two fields you want from row 2 (which holds the second highest salary) of the sorted table.
you can use dense_rank for this.
select firstname, salary
from (select /*+ first_rows(2) */ firstname, salary,
dense_rank() Over (order by salary desc) r
from employees)
where r = 2;
the first_rows hint is there as to help it use an index (index on (salary) or (salary, firstname).
This may return > 1 row if 2 people happen to share the same salary (you can add and rownum = 1 to pick just one at random).
Try this out
SELECT * FROM EMP WHERE SAL >=(SELECT MAX (SAL) FROM EMP WHERE SAL < (SELECT MAX(SAL) FROM EMP WHERE SAL <(SELECT MAX(SAL) FROM EMP))) AND ROWNUM < 4 ORDER BY SAL