I am currently getting a list of related field like so
List ( join_table::id_b;)'
and what i would like to do is filter that list by a second field in the same related table pseudo code as follows
List ( join_table::id_b;jointable:other="foo")
not really sure how to filter it down
The List() function will return a list of (non-empty) values from all related records.
To get a list filtered by a second field, you could do any one of the following:
Define a calculation field in the join table = If ( other = "foo" ; id_b ) and use this field in your List() function call instead of the id_b field;
Construct a relationship filtered by the other field;
Use the ExecuteSQL() function instead of List();
Write your own recursive custom function (requires the Advanced version to install).
Related
I would like to create ForEach loop and need advice:
I have "Fetch" Lookup with "Select CustomerName, Country From Customers".
It return rows like "Tesla, USA" and "Nissan, Japan". There are total 10 rows.
I would like to run ForEach loop for 10 times and use CustomerName and Country value in pipeline.
ForEach settings values are current set: #activity('Fetch').output (something wrong here?)
I would like to create new Lookup inside ForEach. I would like in Lookup query "SELECT * FROM Table WHERE CustomerName = 'CustomerName' and Country = 'CountryName'"
Error of ForEach:
The function 'length' expects its parameter to be an array or a string. The provided value is of type 'Object'.
The Items property of the For Each activity should look something like this:
#activity('Fetch').output.value
You can then reference columns from your Lookup within the For Each activity using the item() syntax, eg #item().CustomerName. Remember, expressions in Azure Data Factory (ADF) start with the # symbol but you don't have to repeat it in the string.
This seems like it would be straightforward to do but I just can not figure it out. I have a query that returns an ARRAY of strings in one of the columns. I want that array to only contain unique strings. Here is my query:
SELECT
f."_id",
ARRAY[public.getdomain(f."linkUrl"), public.getdomain(f."sourceUrl")] AS file_domains,
public.getuniqdomains(s."originUrls", s."testUrls") AS source_domains
FROM
files f
LEFT JOIN
sources s
ON
s."_id" = f."sourceId"
Here's an example of a row from my return table
_id
file_domains
source_domains
2574873
{cityofmontclair.org,cityofmontclair.org}
{cityofmontclair.org}
I need file_domains to only contain unique values, IE a 'set' instead of a 'list'. Like this:
_id
file_domains
source_domains
2574873
{cityofmontclair.org}
{cityofmontclair.org}
Use a CASE expression:
CASE WHEN public.getdomain(f."linkUrl") = public.getdomain(f."sourceUrl")
THEN ARRAY[public.getdomain(f."linkUrl")]
ELSE ARRAY[public.getdomain(f."linkUrl"), public.getdomain(f."sourceUrl")]
END
I have a List:hdtList which contain columns that represent the columns of a Hive table:
forecast_id bigint,period_year bigint,period_num bigint,period_name string,drm_org string,ledger_id bigint,currency_code string,source_system_name string,source_record_type string,gl_source_name string,gl_source_system_name string,year string
I have a List: partition_columns which contains two elements: source_system_name, period_year
Using the List: partition_columns, I am trying to match them and move the corresponding columns in List: hdtList to the end of it as below:
val (pc, notPc) = hdtList.partition(c => partition_columns.contains(c.takeWhile(x => x != ' ')))
But when I print them as: println(notPc.mkString(",") + "," + pc.mkString(","))
I see the output unordered as below:
forecast_id bigint,period_num bigint,period_name string,drm_org string,ledger_id bigint,currency_code string,source_record_type string,gl_source_name string,gl_source_system_name string,year string,period string,period_year bigint,source_system_name string
The columns period_year comes first and the source_system_name last. Is there anyway I can make data as below so that the order of columns in the List: partition_columns is maintained.
forecast_id bigint,period_num bigint,period_name string,drm_org string,ledger_id bigint,currency_code string,source_record_type string,gl_source_name string,gl_source_system_name string,year string,period string,source_system_name string,period_year bigint
I know there is an option to reverse a List but I'd like to learn if I can implement a collection that maintains that order of insert.
It doesn't matter which collections you use; you only use partition_columns to call contains which doesn't depend on its order, so how could it be maintained?
But your code does maintain order: it's just hdtList's.
Something like
// get is ugly, but safe here
val pc1 = partition_columns.map(x => pc.find(y => y.startsWith(x)).get)
after your code will give you desired order, though there's probably more efficient way to do it.
I am currently trying to create a concatenating string for each group. This string should be the concatenation of all the occurrences of the field.
For the moment my code looks like :
grouped = GROUP a by group_field;
b = FOREACH grouped {
unique_field = DISTINCT myfield;
tupl = TOTUPLE(unique_field) ;
FOREACH tupl GENERATE group as id, CONCAT( ? ) as my_new_string;
}
The thing is I absolutely do not know for each group the number of distinct fields or what they contains. I don't know how what to do to replace the ? and make it work.
TOTUPLE is not doing what you are expecting, it is making a one element tuple where that one element is the bag of unique_field.
Also, CONCAT only takes two things to concat and they must be explicitly defined. Let's say that you have a schema like A: {A1: chararray, A2: chararray, A3: chararray} and you want to concatinate all fields together. You will have to do this (which is obviously not ideal): CONCAT(CONCAT(A1, A2), A3).
Anyways, this problem can be easily solved with a python UDF.
myudfs.py
#!/usr/bin/python
#outputSchema('concated: string')
def concat_bag(BAG):
return ''.join(BAG)
This UDF would be used in your script like:
Register 'myudfs.py' using jython as myfuncs;
grouped = GROUP a by group_field;
b = FOREACH grouped {
unique_field = DISTINCT myfield;
GENERATE group as id, myfuncs.concat_bag(unique_field);
}
I just noticed the FOREACH tupl GENERATE ... line. That is not valid syntax. The last statement in a nested FOREACH should be a GENERATE.
I'm looking for a way to get special fields from mongoskin find function. in other words in SQL lang we say select column1,column2,column3 from mytable rather than select *
currently my query is like below and i want to get specify the fields that I'm looking for rather the the whole json object.
db.collection('collacta').find().toArray(function(err, result) {
if (result) {
...
} else {
...
};
});
thanks
For getting the projection of fields, you should pass the DBObject for projection,
DBCursor cursor = collection.find(query, projectionQuery);
The projection is The DBObject in form of key-value pair. where,
key is the name of field you want to project. value can be either 0 or 1.
0 - means exclude the particular column from result set.
1 - means include the particular column in result set.
For more info, see here.