How to use result of Lookup Activity in next Lookup of Azure Data Factory?

How to use result of Lookup Activity in next Lookup of Azure Data Factory? - azure-data-factory

I have Lookup "Fetch Customers" with SQL statement:
Select Count(CustomerId) As 'Row_count' ,Min(sales_amount) as 'Min_Sales' From [sales].
[Customers]
It returns value
10, 5000
Next I have Lookup "Update Min Sales" with SQL statement, but getting error:
Update Sales_Min_Sales
SET Row_Count = #activity('Fetch Customers').output.Row_count,
Min_Sales = #activity('Fetch Customers').output.Min_Sales
Select 1
Same error occurs even I set Lookup to
Select #activity('Fetch Fetch Customers').output.Row_count
Error:
A database operation failed with the following error: 'Must declare the scalar variable
"#activity".',Source=,''Type=System.Data.SqlClient.SqlException,Message=Must declare the
scalar variable "#activity".,Source=.Net SqlClient Data
Provider,SqlErrorNumber=137,Class=15,ErrorCode=-2146232060,State=2,Errors=
[{Class=15,Number=137,State=2,Message=Must declare the scalar variable "#activity".,},],'

I have similar set up as yours. Two lookup activities.
First look up brings min ID and Max ID as shown
{
"count": 1,
"value": [
{
"Min": 1,
"Max": 30118
}
],
"effectiveIntegrationRuntime": "DefaultIntegrationRuntime (East US)",
"billingReference": {
"activityType": "PipelineActivity",
"billableDuration": [
{
"meterType": "AzureIR",
"duration": 0.016666666666666666,
"unit": "DIUHours"
}
]
},
"durationInQueue": {
"integrationRuntimeQueue": 22
}
}
in my second lookup i am using the below expression
Update saleslt.customer set somecol=someval where CustomerID=#{activity('Lookup1').output.Value[0]['Min']}
Select 1 as dummy
Just that we have to access lookup output using indices as mentioned and place the activity output inside {}.

Related

How to get Quantile/median values in pydruid

My goal is to query the median value of column height in my druid datasource. I was able to use other aggregations like count and count distinct values. Here's my query so far:
group = query.groupby(
datasource=datasource,
granularity='all',
intervals='2020-01-01T00:00:00+00:00/2101-01-01T00:00:00+00:00',
dimensions=[
"category_a"
],
filter=(Dimension("country") == country_id),
aggregations={
'count': longsum('count'),
'count_distinct_city': aggregators.thetasketch('city'),
}
)
There's a class Quantile under postaggregator.py so I tried using this.
class Quantile(Postaggregator):
def __init__(self, name, probability):
Postaggregator.__init__(self, None, None, name)
self.post_aggregator = {
"type": "quantile",
"fieldName": name,
"probability": probability,
}
Here's my attempt at getting the median:
post_aggregations={
'median_value': postaggregator.Quantile(
'height', 50
)
}
The error I'm getting here is 'Could not resolve type id \'quantile\' as a subtype of [simple type, class io.druid.query.aggregation.PostAggregator]:
Druid Error: {'error': 'Unknown exception', 'errorMessage': 'Could not resolve type id \'quantile\' as a subtype of [simple type, class io.druid.query.aggregation.PostAggregator]: known type ids = [arithmetic, constant, doubleGreatest, doubleLeast, expression, fieldAccess, finalizingFieldAccess, hyperUniqueCardinality, javascript, longGreatest, longLeast, quantilesDoublesSketchToHistogram, quantilesDoublesSketchToQuantile, quantilesDoublesSketchToQuantiles, quantilesDoublesSketchToString, sketchEstimate, sketchSetOper, thetaSketchEstimate, thetaSketchSetOp] (for POJO property \'postAggregations\')\n at [Source: (org.eclipse.jetty.server.HttpInputOverHTTP); line: 1, column: 856] (through reference chain: io.druid.query.groupby.GroupByQuery["postAggregations"]->java.util.ArrayList[0])', 'errorClass': 'com.fasterxml.jackson.databind.exc.InvalidTypeIdException', 'host': None}

I modified the code of pydruid to get this working on our end. I've created new aggregator and postaggregator under /pydruid/utils.
aggregator.py
def quantilesDoublesSketch(raw_column, k=128):
return {"type": "quantilesDoublesSketch", "fieldName": raw_column, "k": k}
postaggregator.py
class QuantilesDoublesSketchToQuantile(Postaggregator):
def __init__(self, name: str, field_name: str, fraction: float):
self.post_aggregator = {
"type": "quantilesDoublesSketchToQuantile",
"name": name,
"fraction": fraction,
"field": {
"fieldName": field_name,
"name": field_name,
"type": "fieldAccess",
},
}
My first time to create a PR! Hopefully they accept and publish officially.
https://github.com/druid-io/pydruid/pull/287

Compose Transporter throws error when collection_filters is set to sync data for current day from DocumentDB/MongoDB to file/ElasticSearch

I am using Compose Transporter to sync data from DocumentDB to ElasticSearch instance in AWS. After one time sync, I added following collection_filters in pipeline.js to sync incremental data daily:
// pipeline.js
var source = mongodb({
"uri": "mongodb <URI>"
"ssl": true,
"collection_filters": '{ "mycollection": { "createdDate": { "$gt": new Date(Date.now() - 24*60*60*1000) } }}',
})
var sink = file({
"uri": "file://mongo_dump.json"
})
t.Source("source", source, "^mycollection$").Save("sink", sink, "/.*/")
I get following error:
$ transporter run pipeline.js
panic: malformed collection_filters [recovered]
panic: Panic at 32: malformed collection_filters [recovered]
panic: Panic at 32: malformed collection_filters
goroutine 1 [running]:
github.com/compose/transporter/vendor/github.com/dop251/goja.(*Runtime).RunProgram.func1(0xc420101d98)
/Users/JP/gocode/src/github.com/compose/transporter/vendor/github.com/dop251/goja/runtime.go:779 +0x98
When I change collection_filters so that value of "gt" key is single string token (see below), malformed error vanishes but it doesn't fetch any document:
'{ "mycollection": { "createdDate": { "$gt": "new Date(Date.now() - 24*60*60 * 1000)" } }}',
To check if something is fundamentally wrong with the way I am querying, tried simple string filter and that works well:
"collection_filters": '{ "articles": { "createdBy": "author name" }}',
I tried various ways to pass createdDate filter but either getting malformed error or no data. However same filter on mongo shell gives me expected output. Note that I tried with ES as well as file as sink before asking here.

Delete record based on condition from mongo database using jmeter

I am trying to delete the record from Mongo database using jmeter jsr223 groovy protocol but getting error.
Request:
--delete the records which is retrieved between the insert date
MongoCollection Collection = database.getCollection("TEST_COMM");
Collectionenter code here.deleteMany{
$and: [
{
"insertDate": {
$gte: ISODate("2019-10-15T15:16:45.328+0000")
}
}
,
{
"insertDate": {
$lte: ISODate("2019-10-15T17:02:44.017+0000")
}
}
]
}
Response:
Script1.groovy: 50: expecting '}', found ':' # line 50, column 25.
"insertDate": {
^
Can someone help in resolving the issue.

Get the groups of a customeruser in otrs

I am extending OTRS with an app and need to get the groups a customeruser is in. I want to do this by communicating with the SessionGet-Endpoint (https://doc.otrs.com/doc/api/otrs/6.0/Perl/Kernel/GenericInterface/Operation/Session/SessionGet.pm.html)
The Endpoint for SessionGet returns a lot of information about the user but not the groups he is in. I am not talking about agents who can login to the backend of otrs but customerusers.
I am using OTRS 6 because it was the only one available in docker. I created the REST-endpoints in the backend and everything works well. There is a new functionality why I need to get the information about the groups.
Had a look at the otrs system-config but could not figure out if it is possible to include this information in the response.

Although I am a programmer, I did not want to write perl because of ... reasons.
I had a look at the file which handles the incoming request at /opt/otrs/Kernel/GenericInterface/Operation/Session/SessionGet.pm and traced the calls to the actual file where the information is collected from the database in /opt/otrs/Kernel/System/AuthSession/DB.pm. In line 169 the SQL-statement is written so it came to my mind that I just can extend this to also get the information of the groups, because, as I said, I did not want to write perl...
A typical response from this endpoint looks like this:
{
"SessionData": [
{
"Value": "2",
"Key": "ChangeBy"
},
{
"Value": "2019-06-26 13:43:18",
"Key": "ChangeTime"
},
{
"Value": "2",
"Key": "CreateBy"
},
{
"Value": "2019-06-26 13:43:18",
"Key": "CreateTime"
},
{
"Value": "XXX",
"Key": "CustomerCompanyCity"
},
{
"Value": "",
"Key": "CustomerCompanyComment"
}
...
}
A good thing would be to just insert another Value-Key-pair with the IDs of the groups. The SQL-statement queries only one table $Self->{SessionTable} mostly called otrs.sessions.
I used the following resources to create a SQL-statement which extends the existing SQL-statement with the needed information. You can find it here:
$DBObject->Prepare(
SQL => "
(
SELECT id, data_key, data_value, serialized FROM $Self->{SessionTable} WHERE session_id = ? ORDER BY id ASC
)
UNION ALL
(
SELECT
(
SELECT MAX(id) FROM $Self->{SessionTable} WHERE session_id = ?
) +1
AS id,
'UserGroupsID' AS data_key,
(
SELECT GROUP_CONCAT(DISTINCT group_id SEPARATOR ', ')
FROM otrs.group_customer_user
WHERE user_id =
(
SELECT data_value
FROM $Self->{SessionTable}
WHERE session_id = ?
AND data_key = 'UserID'
ORDER BY id ASC
)
)
AS data_value,
0 AS serialized
)",
Bind => [ \$Param{SessionID}, \$Param{SessionID}, \$Param{SessionID} ],
);
Whoever needs to get the groups of a customeruser can replace the existing code with the one provided. At least in my case it works very well. Now, I get the expected key-value-pair:
{
"Value": "10, 11, 6, 7, 8, 9",
"Key": "UserGroupsID"
},
I used the following resources:
Adding the results of multiple SQL selects?
Can I concatenate multiple MySQL rows into one field?
Add row to query result using select
Happy coding,
Nico

How to properly use context variables in OrientDB ETL configuration file?

Summary
Trying to learn about OrientDB ETL configuration json file.
Assuming a CSV file where:
each row is a single vertex
a 'class' column gives the intended class of the vertex
there are multiple classes for the vertices (Foo, Bar, Baz)
How do I set the class of the vertex to be the value of the 'class' column?
Efforts to Troubleshoot
I have spent a LOT of time in the OrientDB ETL documentation trying to solve this. I have tried many different combinations of let and block and code components. I have tried variable names like className and $className and ${classname}.
Current Results:
The code component is able to correctly print the value of `className', so I know that it is being set correctly.
The vertex component isn't referencing the variable correctly, and consequently sets the class of each vertex to null.
Context
I have a freshly created database (PLOCAL GRAPH) on localhost called 'deleteme'.
I have an vertex CSV file (nodes.csv) that looks like this:
id,name,class
1,Jack,Foo
2,Jill,Bar
3,Gephri,Baz
And an ETL configuration file (test.json) that looks like this:
{
"config": {
"log": "DEBUG"
},
"source": {"file": {"path": "nodes.csv"}},
"extractor": {"csv": {}},
"transformers": [
{"block": {"let": {"name": "$className",
"value": "$input.class"}}},
{"code": {"language": "Javascript",
"code": "print(className + '\\n'); input;"}},
{"vertex": {"class": "$className"}}
],
"loader": {
"orientdb": {
"dbURL": "remote:localhost:2424/deleteme",
"dbUser": "admin",
"dbPassword": "admin",
"dbType": "graph",
"tx": false,
"wal": false,
"batchCommit": 1000,
"classes": [
{"name": "Foo", "extends": "V"},
{"name": "Bar", "extends": "V"},
{"name": "Baz", "extends": "V"}
]
}
}
}
And when I run the ETL job, I have output that looks like this:
aj#host:~/bin/orientdb-community-2.1.13/bin$ ./oetl.sh test.json
OrientDB etl v.2.1.13 (build 2.1.x#r9bc1a54a4a62c4de555fc5360357f446f8d2bc84; 2016-03-14 17:00:05+0000) www.orientdb.com
BEGIN ETL PROCESSOR
[file] INFO Reading from file nodes.csv with encoding UTF-8
[orientdb] DEBUG - OrientDBLoader: created vertex class 'Foo' extends 'V'
[orientdb] DEBUG orientdb: found 0 vertices in class 'null'
+ extracted 0 rows (0 rows/sec) - 0 rows -> loaded 0 vertices (0 vertices/sec) Total time: 1001ms [0 warnings, 0 errors]
[orientdb] DEBUG - OrientDBLoader: created vertex class 'Bar' extends 'V'
[orientdb] DEBUG orientdb: found 0 vertices in class 'null'
[orientdb] DEBUG - OrientDBLoader: created vertex class 'Baz' extends 'V'
[orientdb] DEBUG orientdb: found 0 vertices in class 'null'
[csv] DEBUG document={id:1,class:Foo,name:Jack}
[1:block] DEBUG Transformer input: {id:1,class:Foo,name:Jack}
[1:block] DEBUG Transformer output: {id:1,class:Foo,name:Jack}
[1:code] DEBUG Transformer input: {id:1,class:Foo,name:Jack}
Foo
[1:code] DEBUG executed code=OCommandExecutorScript [text=print(className); input;], result={id:1,class:Foo,name:Jack}
[1:code] DEBUG Transformer output: {id:1,class:Foo,name:Jack}
[1:vertex] DEBUG Transformer input: {id:1,class:Foo,name:Jack}
[1:vertex] DEBUG Transformer output: v(null)[#3:0]
[csv] DEBUG document={id:2,class:Bar,name:Jill}
[2:block] DEBUG Transformer input: {id:2,class:Bar,name:Jill}
[2:block] DEBUG Transformer output: {id:2,class:Bar,name:Jill}
[2:code] DEBUG Transformer input: {id:2,class:Bar,name:Jill}
Bar
[2:code] DEBUG executed code=OCommandExecutorScript [text=print(className); input;], result={id:2,class:Bar,name:Jill}
[2:code] DEBUG Transformer output: {id:2,class:Bar,name:Jill}
[2:vertex] DEBUG Transformer input: {id:2,class:Bar,name:Jill}
[2:vertex] DEBUG Transformer output: v(null)[#3:1]
[csv] DEBUG document={id:3,class:Baz,name:Gephri}
[3:block] DEBUG Transformer input: {id:3,class:Baz,name:Gephri}
[3:block] DEBUG Transformer output: {id:3,class:Baz,name:Gephri}
[3:code] DEBUG Transformer input: {id:3,class:Baz,name:Gephri}
Baz
[3:code] DEBUG executed code=OCommandExecutorScript [text=print(className); input;], result={id:3,class:Baz,name:Gephri}
[3:code] DEBUG Transformer output: {id:3,class:Baz,name:Gephri}
[3:vertex] DEBUG Transformer input: {id:3,class:Baz,name:Gephri}
[3:vertex] DEBUG Transformer output: v(null)[#3:2]
END ETL PROCESSOR
+ extracted 3 rows (4 rows/sec) - 3 rows -> loaded 3 vertices (4 vertices/sec) Total time: 1684ms [0 warnings, 0 errors]
Oh, and what does DEBUG orientdb: found 0 vertices in class 'null' mean?

Try this. I wrestled with this for awhile too, but the below setup worked for me.
Note that setting #class before the vertex transformer will initialize a Vertex with the proper class.
"transformers": [
{"block": {"let": {"name": "$className",
"value": "$input.class"}}},
{"code": {"language": "Javascript",
"code": "print(className + '\\n'); input;"}},
{ "field": {
"fieldName": "#class",
"expression": "$className"
}
},
{"vertex": {}}
]

To get your result, you could use "ETL" to import data from csv into a CLASS named "Generic".
Through an JS function, "separateClass ()", create new classes taking the name from the property 'Class' imported from csv, and put vertices from class Generic to new classes.
File json:
{
"source": { "file": {"path": "data.csv"}},
"extractor": { "row": {}},
"begin": [
{ "let": { "name": "$className", "value": "Generic"} }
],
"transformers": [
{"csv": {
"separator": ",",
"nullValue": "NULL",
"columnsOnFirstLine": true,
"columns": [
"id:Integer",
"name:String",
"class:String"
]
}
},
{"vertex": {"class": "$className", "skipDuplicates": true}}
],
"loader": {
"orientdb": {
"dbURL": "remote:localhost/test",
"dbType": "graph"
}
}
}
After importing the data from etl, in javascript creates the function
var g = orient.getGraphNoTx();
var queryResult= g.command("sql", "SELECT FROM Generic");
//example filed vertex: ID, NAME, CLASS
if (!queryResult.length) {
print("Empty");
} else {
//for each value create or insert in class
for (var i = 0; i < queryResult.length; i++) {
var className = queryResult[i].getProperty("class").toString();
//chech is className is already created
var countClass = g.command("sql","select from V where #class = '"+className+"'");
if (!countClass.length) {
g.command("sql","CREATE CLASS "+className+" extends V");
g.command("sql"," CREATE PROPERTY "+className+".id INTEGER");
g.command("sql"," CREATE PROPERTY "+className+".name STRING");
g.commit();
}
var id = queryResult[i].getProperty("id").toString();
var name = queryResult[i].getProperty("name").toString();
g.command("sql","INSERT INTO "+className+ " (id, name) VALUES ("+id+",'"+name+"')");
g.commit();
}
//remove class generic
g.command("sql","truncate class Generic unsafe");
}
the result should be like the one shown in the picture.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How to use result of Lookup Activity in next Lookup of Azure Data Factory? - azure-data-factory

Related

How to get Quantile/median values in pydruid

Compose Transporter throws error when collection_filters is set to sync data for current day from DocumentDB/MongoDB to file/ElasticSearch

Delete record based on condition from mongo database using jmeter

Get the groups of a customeruser in otrs

How to properly use context variables in OrientDB ETL configuration file?

Categories

Resources