there are two tables
tbl_1 as a source data
ID | Submission_id
--------------------
1 | A00_1
2 | A00_2
3 | A00_3
4 | A00_4
5 | A00_5
6 | A00_6
7 | A00_7
tbl_2 as destination. In this table, Submission_id is unique key.
ID | Submission_id
--------------------
1 | A00_1
2 | A00_2
3 | A00_3
4 | A00_4
tbl_1 as input value and tbl_2 as destination (sink). Expected result is only A00_5, A00_6 & A00_7 sent to tbl_2. So, this picture below is the Join
for AlterRow,
expected ouput
tbl_2
ID | Submission_id
--------------------
1 | A00_1
2 | A00_2
3 | A00_3
4 | A00_4
5 | A00_5 -->(new)
6 | A00_6 -->(new)
7 | A00_7 -->(new)
But, output result from alterRow are all Submission_id. It should be only not equal comparison that has been stated in the alter row condition,
notEquals(DC__Submission_ID_BigInt, SrcStgDestination#{_Submission_ID}).
How to solve this problem in Azure DataFlow use 'Join' ?
I tried doing the same procedure and got the same result (all rows getting inserted). We were able to perform join in the desired way but couldn’t proceed further to get the required output. You can use the approach given below instead, which is achieved using JOINS.
In general, when we want to get records from table1 which are not present in table2, we execute the following query (in sql server).
select t1.id,t1.submission_id from t1 left outer join t2 on t1.submission_id = t2.submission_id where t2.submission_id is NULL
In the Dataflow, we were able to achieve the join successfully (same procedure as yours). Now instead using alter row transformation, I used filter transformation (to achieve t2.submission_id is NULL condition). I used the following expression (condition) to filter.
isNull(d1#submission_id) && isNull(d1#id)
Now proceed to configure the sink (tbl_2). The preview would show the records as in the below image.
Publish and run the dataflow activity in your pipeline to get the desired results.
In one table I have duplicate values that I would like to group and export only those fields where the value in the "published_at" field is the most up-to-date (the latest date possible). Do I understand it correctly as I use the MAX aggregate function the corresponding fields I would like to extract will refer to the max found or will it take the first found in the table?
Let me demonstrate you this on simple example (in real world example I am also joining two different tables). I would like to group it by id and extract all fields but only relating to the max published_at field. My query would be:
SELECT "t1"."id", "t1"."field", MAX("t1"."published_at") as "published_at"
FROM "t1"
GROUP By "t1"."id"
| id | field | published_at |
---------------------------------
| 1 | document1 | 2022-01-10 |
| 1 | document2 | 2022-01-11 |
| 1 | document3 | 2022-01-12 |
The result I want is:
1 - document3 - 2022-01-12
Also one question - why am I getting this error "ERROR: column "t1"."field" must appear in the GROUP BY clause or be used in an aggregate function". Can I use MAX function on string type column?
If you want the latest row for each id, you can use DISTINCT ON. For example:
select distinct on (id) *
from t
order by id, published_at desc
If you just want the latest row in the whole result set you can use LIMIT. For example:
select *
from t
order by published_at desc
limit 1
Ok, I deleted previous post and will try this again. I am sure I don't know the topic and I'm not sure if this is a loop or if I should use a stored function or how to get what I'm looking for. Here's sample data and expected output;
I have a single table A. Table has following fields; date created, unique person key, type, location.
I need a Postgres query that says for any given month(parameter, based on date created) and given a location(parameter based on location field), provide me fieds below where unique person key may be duplicated + or – 30 days from the date created within the month given for same type but all locations.
Example Data
Date Created | Unique Person | Type | Location
---------------------------------------------------
2/5/2017 | 1 | Admit | Hospital1
2/6/2017 | 2 | Admit | Hospital2
2/15/2017 | 1 | Admit | Hospital2
2/28/2017 | 3 | Admit | Hospital2
3/3/2017 | 2 | Admit | Hospital1
3/15/2017 | 3 | Admit | Hospital3
3/20/2017 | 4 | Admit | Hospital1
4/1/2017 | 1 | Admit | Hospital2
Output for the month of March for Hospital1:
DateCreated| UniquePerson | Type | Location | +-30days | OtherLoc.
------------------------------------------------------------------------
3/3/2017 | 2 | Admit| Hospital1 | 2/6/2017 | Hospital2
Output for the month of March for Hospital2:
None, because no one was seen at Hospital2 in March
Output for the month of March for Hospital3:
DateCreated| UniquePerson | Type | Location | +-30days | otherLoc.
------------------------------------------------------------------------
3/15/2017 | 3 | Admit| Hospital3 | 2/28/2017 | Hospital2
Version 1
I would use a WITH clause. Please, notice that I've added a column id that is a primary key to simplify the query. It's just to prevent the rows to be matched with themselves.
WITH x AS (
SELECT
id,
date_created,
unique_person_id,
type,
location
FROM
a
WHERE
location = 'Hospital1' AND
date_trunc('month', date_created) = date_trunc('month', '2017-03-01'::date)
)
SELECT
x.date_created,
x.unique_person_id,
x.type,
x.location,
a.date_created AS "+-30days",
a.location AS other_location
FROM
x
JOIN a
USING (unique_person_id, type)
WHERE
x.id != a.id AND
abs(x.date_created - a.date_created) <= 30;
Now a little bit of explanations:
First we select, let's say a reference data with a WITH clause. Think of it as a temporary table that we can reference in the main query. In given example it could be a "main visit" in given hospital.
Then we join "main visits" with other visits of the same person and type (JOIN condition) that happen in date difference of 30 days (WHERE condition).
Notice that the WITH query has the limits you want to check (location and date). I use date_trunc function that truncates the date to specified precision (a month in this case).
Version 2
As #Laurenz Albe suggested, there is no special need to use a WITH clause. Right, so here is a second version.
SELECT
x.date_created,
x.unique_person_id,
x.type,
x.location,
a.date_created AS "+-30days",
a.location AS other_location
FROM
a AS x
JOIN a
USING (unique_person_id, type)
WHERE
x.location = 'Hospital1' AND
date_trunc('month', x.date_created) = date_trunc('month', '2017-03-01'::date) AND
x.id != a.id AND
abs(x.date_created - a.date_created) <= 30;
This version is shorter than the first one but, in my opinion, the first is easier to understand. I don't have big enough set of data to test and I wonder which one runs faster (the query planner shows similar values for both).
Sorry if the post is in fact a duplicate. Just could not google anything similar and I am bit stuck on approach.
I am trying to populate cells in one sheet depending on date in rows of a different sheet, like these:
Sheet1 - entry sheet
ID | Name | Start date | End date
10 | Mike | 1.06.2016 | 2.06.2016
13 | Dido | 1.06.2016 | 5.06.2016
8 | Rene | 2.06.2016 | 20.06.2016
Sheet2 - report sheet
ids/dates | 1.06.2016 | 2.06.2016 | 3.06.2016 | date+1
8 | | Rene | Rene | Rene
10 | Mike | Mike | |
13 | Dido | Dido | Dido | Dido
Column Name cell's are to be populated in sheet2 depending on Sheet1 Column ID, Start date, end date. The position of the populated cell is defined in sheet2 by column ID and row Dates that should equal the same values in sheet1.
This report could be done with help of one formula. Please, check this Example File.
Assumptions
Suppose, you have Sheet1 with data:
Col A: ID
Col B: Name
Col C: Start date
Col D: End Date
Case 1. ID's are unique.
Go to Sheet2 and paste this formula in it:
={{"ids/dates";filter(Sheet1!A2:A,Sheet1!A2:A<>"")},{ArrayFormula(add(MIN(Sheet1!C:D),COLUMN(OFFSET(A1,,,1,MAX(Sheet1!C:D)-MIN(Sheet1!C:D)))-1));ArrayFormula(if(--(add(MIN(Sheet1!C:D),COLUMN(OFFSET(A1,,,1,MAX(Sheet1!C:D)-MIN(Sheet1!C:D)))-1)>=filter(Sheet1!C2:C,Sheet1!C2:C<>0))*--(add(MIN(Sheet1!C:D),COLUMN(OFFSET(A1,,,1,MAX(Sheet1!C:D)-MIN(Sheet1!C:D)))-1)<=filter(Sheet1!D2:D,Sheet1!C2:C<>0))=1,VLOOKUP(FILTER(Sheet1!A2:A,Sheet1!A2:A<>""),Sheet1!A:B,2,0),""))}}
That's all. Report will expand automatically when new data arrives on Sheet1. The report will return error if Data is not complete (misssing Names or dates) on Sheet1.
Case 2. ID's are NOT unique.
This solution works when ID's are not unique, ID's will be grouped together. One ID belongs to one person in this case.
The formula will be a bit longer:
={{"ids/dates";sort(UNIQUE(filter(Sheet1!A2:A,Sheet1!A2:A<>"")))},{ArrayFormula(add(MIN(Sheet1!C:D),COLUMN(OFFSET(A1,,,1,MAX(Sheet1!C:D)-MIN(Sheet1!C:D)))-1));ArrayFormula(if(QUERY(QUERY({filter(Sheet1!A2:A,Sheet1!A2:A<>""),ArrayFormula((--(add(MIN(Sheet1!C:D),COLUMN(OFFSET(A1,,,1,MAX(Sheet1!C:D)-MIN(Sheet1!C:D)))-1)>=filter(Sheet1!C2:C,Sheet1!C2:C<>0))*--(add(MIN(Sheet1!C:D),COLUMN(OFFSET(A1,,,1,MAX(Sheet1!C:D)-MIN(Sheet1!C:D)))-1)<=filter(Sheet1!D2:D,Sheet1!C2:C<>0))))},"select Col1, sum(Col"&JOIN("), sum(Col",ArrayFormula(COLUMN(OFFSET(B2,,,1,MAX(Sheet1!C:D)-MIN(Sheet1!C:D)))))&") group by Col1"),"Select Col"&JOIN(", Col",ArrayFormula(COLUMN(OFFSET(B2,,,1,MAX(Sheet1!C:D)-MIN(Sheet1!C:D)))))&" where Col1>0",0)=1,VLOOKUP(sort(UNIQUE(filter(Sheet1!A2:A,Sheet1!A2:A<>""))),Sheet1!A:B,2,0),""))}}
See example here.
Case 3. IDs are NOT unique. One ID <> one name
Here's working example, please check it. This case is the hardest one. We can have multiple IDs referring to multiple names. The final formula:
={{"ids/dates",ArrayFormula(add(MIN(Sheet1!C:D),COLUMN(OFFSET(A1,,,1,MAX(Sheet1!C:D)-MIN(Sheet1!C:D)))-1))};{sort(UNIQUE(FILTER(Sheet1!A2:A,Sheet1!A2:A<>""))),ArrayFormula(IFERROR(VLOOKUP(QUERY(QUERY({FILTER(Sheet1!A2:B,Sheet1!A2:A<>""),ArrayFormula(--(add(MIN(Sheet1!C:D),COLUMN(OFFSET(A1,,,1,MAX(Sheet1!C:D)-MIN(Sheet1!C:D)))-1)>=filter(Sheet1!C2:C,Sheet1!C2:C<>0))*--(add(MIN(Sheet1!C:D),COLUMN(OFFSET(A1,,,1,MAX(Sheet1!C:D)-MIN(Sheet1!C:D)))-1)<=filter(Sheet1!D2:D,Sheet1!C2:C<>0))*row(OFFSET(A1,,,rows(FILTER(Sheet1!A2:B,Sheet1!A2:A<>"")))))},"select Col1, sum(Col"&JOIN("), sum(Col",ArrayFormula(COLUMN(OFFSET(C1,,,1,MAX(Sheet1!C:D)-MIN(Sheet1!C:D)))))&") group by Col1"),"Select Col"&JOIN(", Col",ArrayFormula(COLUMN(OFFSET(B2,,,1,MAX(Sheet1!C:D)-MIN(Sheet1!C:D)))))&" where Col1>0",0),{ArrayFormula(row(OFFSET(A1,,,rows(FILTER(Sheet1!A2:B,Sheet1!A2:A<>""))))),FILTER(Sheet1!A2:B,Sheet1!A2:A<>"")},3,0)))}}
The formula will work incorrectly if two Date ranges intersect:
102 Mike 6/21/2016 6/27/2016
102 Mike 6/11/2016 6/22/2016
I want to search inside a full search column using certain letters, I mean:
select "Name","Country","_score" from datatable where match("Country", 'China');
Returns many rows and is ok. My question is, how can I search for example:
select "Name","Country","_score" from datatable where match("Country", 'Ch');
I want to see, China, Chile, etc.
I think that match_type phrase_prefix can be the answer, but I don't know how I can use (correct syntax).
The match predicate supports different types by use of using match_type [with (match_parameter = [value])].
So in your example using the phrase_prefix match type:
select "Name","Country","_score" from datatable where match("Country", 'Ch') using phrase_prefix;
gives you your desired results.
See the match predicate documentation: https://crate.io/docs/en/latest/sql/fulltext.html?#match-predicate
If you just need to match the beginning of a string column, you don't need a fulltext analyzed column. You can use the LIKE operator instead, e.g.:
cr> create table names_table (name string, country string);
CREATE OK (0.840 sec)
cr> insert into names_table (name, country) values ('foo', 'China'), ('bar','Chile'), ('foobar', 'Austria');
INSERT OK, 3 rows affected (0.049 sec)
cr> select * from names_table where country like 'Ch%';
+---------+------+
| country | name |
+---------+------+
| Chile | bar |
| China | foo |
+---------+------+
SELECT 2 rows in set (0.037 sec)