TSQL selecting unique value from multiple ranges in a column

TSQL selecting unique value from multiple ranges in a column - tsql

A question from a beginner.
I have two tables. One (A) contains Start_time, End_time, Status. Second one (B) contains Timestamp, Error_code. Second table is automatically logged by system every few seconds, so it contains lots of non unique values of Error_code (it changes randomly, but within a time range from table A). What i need is to select unique error code for every time range (in my case every row) from the first table for every time range in table A:
A.Start_time, A.End_time B.Error_code.
I have come to this:
select A.Start_time,
A.End_time,
B.Error_code
from B
inner join A
on B.Timestamp between A.Start_time and A.End_time
This is wrong, i know.
Any thoughts are welcome.

If tour query gives a lot of duplicates use distinct to remove them:
select DISTINCT A.Start_time, A.End_time, B.Error_code
from B
inner join A on B.Timestamp between A.Start_time and A.End_time

Related

Unexpected sort order on postgres left outer join

Background
I'm using Postgres 11 and pgAdmin4 v5.2. The problem I describe below is on my dev machine which has both the postgres server and pgAdmin client.
Questions I've looked at on SO that deal with incorrect ordering seem related to collation-related issues with ordering of text fields, whereas my problem is on an integer field.
Setup
I have a table norm_plans that contains ~5k records.
Column | Type
---------------------------------
canon_id | integer
name | character varying(200)
...
other fields
canon_id is autopopulated using a sequence.
I've created a new table norm_plans_cmp as a copy of norm_plans (CREATE TABLE norm_plans_cmp AS TABLE norm_plans WITH DATA;)
I next insert some new records into norm_plans and update some existing records (fields other than canon_id.
The new records increment the sequence and are assigned canon_id values as expected.
I now want to compare norm_plans against norm_plans_cmp so I perform a left outer join:
select a.*, b.*
from norm_plans a
left outer join norm_plans_cmp b
on a.canon_id = b.canon_id
order by a.canon_id
Problem
I would expect records to be sorted by canon_id. This holds true from 1-2000, but after 2,000 I get canon_ids from 5,001 to 5,111 (which is the last canon_id) and then it again picks up from 2,001. I'm viewing this data from pgAdmin, see screenshot 1 below showing the shift from 2,000 to 5,001, and screenshot 2 showing the transition again from 5,111 back to 2,001.
Additional observations
While incorrect, the ordering seems consistent. Running the query multiple times results in the same (incorrect) ordering.
Despite my question title, I'm not totally sure the left join has anything to do with this.
Running SELECT * ... ORDER BY canon_id on norm_plans or norm_plans_cmp alone also result in incorrect ordering, albeit at different points in the order.
Answers to this SO question suggest index corruption may be a contributing problem, but I have no indexes on either norm_plans or norm_plans_cmp (canon_id is not defined as a PK).
At this point, I'm stumped!

PostgreSQL how to GROUP BY single field from returned table

So I have complicated query, to simplify let it be like
SELECT
t.*,
SUM(a.hours) AS spent_hours
FROM (
SELECT
person.id,
person.name,
person.age,
SUM(contacts.id) AS contact_count
FROM
person
JOIN contacts ON contacts.person_id = person.id
) AS t
JOIN activities AS a ON a.person_id = t.id
GROUP BY t.id
Such query works fine in MySQL, but Postgres needs to know that GROUP BY field is unique, and despite it actually is, in this case I need to GROUP BY all returned fields from returned t table.
I can do that, but I don't believe that will work efficiently with big data.
I can't JOIN with activities directly in first query, as person can have several contacts which will lead query counting hours of activity several time for every joined contact.
Is there a Postgres way to make this query work? Maybe force to treat Postgres t.id as unique or some other solution that will make same in Postgres way?

This query will not work on both database system, there is an aggregate function in the inner query but you are not grouping it(unless you use window functions). Of course there is a special case for MySQL, you can use it with disabling "sql_mode=only_full_group_by". So, MySQL allows this usage because of it' s database engine parameter, but you cannot do that in PostgreSQL.

I knew MySQL allowed indeterminate grouping, but I honestly never knew how it implemented it... it always seemed imprecise to me, conceptually.
So depending on what that means (I'm too lazy to look it up), you might need one of two possible solutions, or maybe a third.
If you intent is to see all rows (perform the aggregate function but not consolidate/group rows), then you want a windowing function, invoked by partition by. Here is a really dumbed down version in your query:
.
SELECT
t.*,
SUM (a.hours) over (partition by t.id) AS spent_hours
FROM t
JOIN activities AS a ON a.person_id = t.id
This means you want all records in table t, not one record per t.id. But each row will also contain a sum of the hours for all values that value of id.
For example the sum column would look like this:
Name Hours Sum Hours
----- ----- ---------
Smith 20 120
Jones 30 30
Smith 100 120
Whereas a group by would have had Smith once and could not have displayed the hours column in detail.
If you really did only want one row per t.id, then Postgres will require you to tell it how to determine which row. In the example above for Smith, do you want to see the 20 or the 100?
There is another possibility, but I think I'll let you reply first. My gut tells me option 1 is what you're after and you want the analytic function.

Oracle Sql - Discarding outer select if inner select returns null, and avoiding multiple rows

Pre-Info: In our company a person is marked * if he is actively working. And there are people who changed their departments.
For a report I use 2 tables named COMPANY_PERSON_ALL and trifm_izinler4, joining person_id field as below.
I want to discard (don't list) the row, if the first inner select returns null.
And I want to prevent the second inner select returning multiple Departments.
select izn.person_id, izn.adi_soyadi, izn.company_id,
(select a.employee_status from COMPANY_PERSON_ALL a where a.employee_status = '*' and a.person_id = izn.person_id) as Status,
(select a.org_code from COMPANY_PERSON_ALL a where a.person_id = izn.person_id) as Department,
izn.hizmet_suresi, izn.kalan_izin
from trifm_izinler4 izn
where trunc(rapor_tarihi) = trunc(SYSDATE)
Can you help me how to overcome these 2 problems of inner select statements?

Assuming you only want to see the department from the active person record, you can just join the two tables instead of using subquery expressions, and filter on that status:
select izn.person_id, izn.adi_soyadi, izn.company_id,
a.employee_status as status, a.org_code as department
izn.hizmet_suresi, izn.kalan_izin
from trifm_izinler4 izn
join company_person_all a on a.person_id = izn.person_id
where rapor_tarihi >= trunc(SYSDATE)
-- and rapor_tarihi < trunc(SYSDATE) + 1 -- probably not needed
and a.employee_status = '*'
I've also changed the date comparison; if you compare using trunc(rapor_tarihi) then a normal index on that column can't be used, so it's generally better to compare the original value against a range. Since you're comparing against today's date you probably only need to look for values greater than midnight today, but if that column can have future dates then you can put an upper bound on the range of midnight tomorrow - which I've included but commented out.
If a person can be active in more than one department at a time then this will show all of those, but your wording suggests people are only active in one at a time. If you want to see a department for all active users, but not necessarily the one that has the active flag (or if there can be more than one active), then it's a bit more complicated, and you need to explain how you would want to choose which to show.

Using EXCEPT and flagging column differences

What Im looking to do is select data from a postgres table, which does not appear in another. Both tables have identical columns, bar the use of boolean over Varchar(1) but the issue is that the data in those columns do not match up.
I know I can do this with a SELECT EXCEPT SELECT statement, which I have implemented and is working.
What I would like to do is find a method to flag the columns that do not match up. As an idea, I have thought to append a character to the end of the data in the fields that do not match.
For example if the updateflag is different in one table to the other, I would be returned '* f' instead of 'f'
SELECT id, number, "updateflag" from dbc.person
EXCEPT
SELECT id, number, "updateflag":bool from dbg.person;
Should I be joining the two tables together, post executing this statement to identify the differences, from whats returned?
I have tried to research methods to implement this but have no found anything on the topic

I prefer a full outer join for this
select *
from dbc.person p1
full join dbg.person p2 on p1.id = p2.id
where p1 is distinct from p2;
The id column is assumed the primary key column that "links" the two tables together.
This will only return rows where at least one column is different.
If you want to see the differences, you could use a hstore feature
select hstore(p1) - hstore(p2) as columns_diff_p1,
hstore(p2) - hstore(p1) as columns_diff_p2
from dbc.person p1
full join dbg.person p2 on p1.id = p2.id
where p1 is distinct from p2;

LEFT JOIN returns incorrect result in PostgreSQL

I have two tables: A (525,968 records) and B (517,831 records). I want to generate a table with all the rows from A and the matched records from B. Both tables has column "id" and column "year". The combination of id and year in table A is unique, but not in table B. So I wrote the following query:
SELECT
A.id,
A.year,
A.v1,
B.x1,
B.e1
FROM
A
LEFT JOIN B ON (A.id = B.id AND A.year = B.year);
I thought the result should contain the same total number of records in A, but it only returns about 517,950 records. I'm wondering what the possible cause may be.
Thanks!

First of all, I understand that this is an example, but postgres may hava an issues with capital letters in the table names.
Secondly, it may be a good idea to check how exactly you calculated 525,968 records. The thing is - if you use sime kind of client of database administration / queries - it may show you different / technical information about tables (there may be internal row counters in postgres that may actually differ from the number of records).
And finally to check yourself do something like
SELECT
count("A".id)
FROM
"A"

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse