How to split a column based on first occurrence of a delimter MongoDB - mongodb

I have A column like this and it should be split based on the first "-", example is below
MGESAD :
"6095 - NCAM - US - GIUTCB - US Consumer Bank - USRB"
"6595 - NBAM - US - UDAS - Consumer Bank - USRB"
"0595 - NWWAM - US - GWCB - US BANK Bank - USRB - TBL"
I need to split this column into:
Col1 Col2
6095 NCAM - US - GIUTCB - US Consumer Bank - USRB
6595 NBAM - US - UDAS - Consumer Bank - USRB
0595 NWWAM - US - GWCB - US BANK Bank - USRB - TBL
Tried So far:
db.getCollection("arTes").aggregate([
{
$addFields: {
MGE_ID: { $arrayElemAt: [ { "$split": [ "$MGESAD y", "-"] }, 0 ] },
MGE_DESC: { $arrayElemAt:[{ "$split": [ "$MGESAD ", "-"] },2] }
}
}
])
MGE_DESC is giving only 2 element I need entire string excluding the first split.
Let me know if there is any eaiser way to do this?

Query
pipeline update requires MongoDB >= 4.2
because you want to split on first index of "-" you can do it with out splitting in all "-" occurences
the bellow finds the index of "-" the left part is the MGESAD and the right is the MGE_DESC
*if you only want to aggregate, use the pipeline ["$set" ...] in aggregation
*if you wanted to do this not for the first or last "-" you could split and then $concat and maybe $reduce depending on your needs but here its more simple so those weren't used
PlayMongo
updade({},
[{"$set":
{"MGESAD":
{"$substrCP": ["$MGESAD", 0, {"$indexOfCP": ["$MGESAD", " - "]}]},
"MGE_DESC":
{"$substrCP":
["$MGESAD",
{"$add": [{"$indexOfCP": ["$MGESAD", " - "]}, 3]},
{"$strLenCP": "$MGESAD"}]}}}],
{"multi" : true})

Related

can slot take entity values without a action function or forms in RASA?

is it possible to pass values in the entity to slots without form or writing an action function?
nlu.yml
nlu:
- intent: place_order
examples: |
- wanna [large](size) shoes for husky
- need a [small](size) [green](color) boots for pupps
- have [blue](color) socks
- would like to place an order
- lookup: size
examples: |
- small
-medium
-large
- synonym: small
examples: |
- small
- s
- tiny
- synonym: large
examples: |
- large
- l
- big
- lookup: color
examples: |
- white
- red
- green
domain.yml
version: "2.0"
intents:
- greet
- goodbye
- affirm
- deny
- mood_great
- mood_unhappy
- bot_challenge
- place_order
entities:
- size
- color
slot:
size:
type: text
color:
type: text
responses:
utter_greet:
- text: "Hey! can I assist you ?"
utter_order_list:
- text : "your order is {size} [color} boots. right?"
stories.yml
version: "2.0"
stories:
- story: place_order
steps:
- intent: greet
- action: utter_greet
- intent: place_order
- action: utter_order_list
debug output: it recognize entity , but the value is not passed to slot
Hey! can I assist you ?
Your input -> I would like to place an order for large blue shoes for my puppy
Received user message 'I would like to place an order for large blue shoes for my puppy' with intent '{'id': -2557752933293854887, 'name': 'place_order', 'confidence': 0.9996021389961243}' and entities '[{'entity': 'size', 'start': 35, 'end': 40, 'confidence_entity': 0.9921159148216248, 'value': 'large', 'extractor': 'DIETClassifier'}, {'entity': 'color', 'start': 41, 'end': 45, 'confidence_entity': 0.9969255328178406, 'value': 'blue', 'extractor': 'DIETClassifier'}]'
Failed to replace placeholders in response 'your order is {size} [color} boots. right?'. Tried to replace 'size' but could not find a value for it. There is no slot with this name nor did you pass the value explicitly when calling the response. Return response without filling the response
"slot" is an unknown keyword. you should write "slots" instead of "slot" in the domain file and it will work.

MongoDB query to compute percentage

I am new to MongoDB and kind of stuck at this query. Any help/guidance will be highly appreciated. I am not able to calculate the percentage in the desired way. There is something wrong with my pipeline where prerequisites of percentage are not computed correctly. Following I provide my unsuccessful attempt along with the desired output.
Single entry in the collection looks like below:
_id : ObjectId("602fb382f060fff5419fd0d1")
time : "2019/05/02 00:00:00"
station_id : 3544
station_name : "Underhill Ave &; Pacific St"
station_status : "In Service"
latitude : 40.6804836
longitude : -73.9646795
zipcode : 11238
borough : "Brooklyn"
neighbourhood : "Prospect Heights"
available_bikes : 5
available_docks : 21
The query I am trying to solve is:
Given a station_id (e.g., 522) and a num_hours (e.g., 3) passed as parameters:
- Consider only the measurements where the station_status = “In Service”.
- Consider only the measurements for that concrete
“station_id”.
- Compute the percentage of measurements with
available_bikes = 0 for each hour of the day (e.g., for the period
[8am, 9am) the percentage is 15.06% and for the period [9am, 10am)
the percentage is
27.32%).
- Sort the percentage results in decreasing order.
- Return the top “num_hours” documents.
The desired output is:
--- DOCUMENT 0 INFO ---
---------------------------------
hour : 19
percentage : 65.37
total_measurements : 283
zero_bikes_measurements : 185
---------------------------------
--- DOCUMENT 1 INFO ---
---------------------------------
hour : 21
percentage : 64.79
total_measurements : 284
zero_bikes_measurements : 184
---------------------------------
--- DOCUMENT 2 INFO ---
---------------------------------
hour : 00
percentage : 63.73
total_measurements : 284
zero_bikes_measurements : 181
My attempt is:
command_1 = {"$match": {"station_status": "In Service", "station_id": station_id, "available_bikes": 0}}
my_query.append(command_1)
command_2 = {"$group": {"_id": "null", "total_measurements": {"$sum": 1}}}
my_query.append(command_2)
command_3 = {"$project": {"_id": 0,
"station_id": 1,
"station_status": 1,
"hour": {"$substr": ["$time", 11, 2]},
"available_bikes": 1,
"total_measurements": {"$sum": 1}
}
}
my_query.append(command_3)
command_4 = {"$group": {"_id": "$hour", "zero_bikes_measurements": {"$sum": 1}}}
my_query.append(command_4)
command_5 = {"$project": {"percent": {
"$multiply": [{"$divide": ["$total_measurements", "$zero_bikes_measurements"]},
100]}}}
my_query.append(command_5)
I've taken a look at this and I'm going to offer some sincere advice:
Don't try and do this in an aggregate query. Just go back to basics and pull the numbers out using find()s and then calculate the numbers in python.
If you want to persist with an aggregate query, I will say that your match command filters on available_bikes equal to zero. You never have the total number of measurements, so you can never find the percentage. Also when you have done your first $group, your "lose" your projection so at that point in the pipeline you only have total_measurements and that's it (comment out the commands 3 to 5 to see what I mean).

select and calculate a new column in a spark dataframe pyspark

I have a spark dataframe with this format:
opp_id__reference|oplin_status| stage| std_amount| std_line_amount|
+-----------------+------------+--------------------+----------------+----------------+
|OP-171102-67318| Won|7 - Deliver & Val...|6243.316662349|6243.31666234948|
|OP-180910-77114| Won|7 - Deliver & Val...|5014.57880858921|5014.57880858921|
|OP-180910-76544| Pending|7 - Deliver & Val...|5014.57880858921|5014.57880858921|
|OP-180910-76544| Pending|7 - Deliver & Val...|5014.57880858921|5614.57880858921|
|OP-180910-76544| Won|7 - Deliver & Val...|5014.57880858921|5994.57880858921|
I would like to extract the list of opp_id__reference that the sum of records which has oplin_status = "Pending" is bigger than std_amount
This hiw I did :
# select opp_line which stage =='7 - Deliver & Validate' and oplin_status =='Pending'
DF_BR8 = df.filter(df.stage.contains("7 - Deliver")).select('opp_id__reference', 'oplin_status', 'stage', 'std_amount', 'std_line_amount')
DF_BR8_1 = DF_BR8.groupby('opp_id__reference', 'std_amount', 'oplin_status').agg({'std_line_amount': 'sum'}).withColumnRenamed('sum(std_line_amount)','sum_column')
DF_res = DF_BR8_1.filter(DF_BR8_1.oplin_status.contains("Pending"))
DF_res1 =DF_res.filter(DF_res.sum_column <= 0.3*DF_BR8_1.std_amount)
My question is it : is it correct what i did? is there any other way more simple to do?
Thanks

How to comment on a specific line number on a PR on github

I am trying to write a small script that can comment on github PRs using eslint output.
The problem is eslint gives me the absolute line numbers for each error.
But github API wants the line number relative to the diff.
From the github API docs: https://developer.github.com/v3/pulls/comments/#create-a-comment
To comment on a specific line in a file, you will need to first
determine the position in the diff. GitHub offers a
application/vnd.github.v3.diff media type which you can use in a
preceding request to view the pull request's diff. The diff needs to
be interpreted to translate from the line in the file to a position in
the diff. The position value is the number of lines down from the
first "##" hunk header in the file you would like to comment on.
The line just below the "##" line is position 1, the next line is
position 2, and so on. The position in the file's diff continues to
increase through lines of whitespace and additional hunks until a new
file is reached.
So if I want to add a comment on new line number 5 in the above image, then I would need to pass 12 to the API
My question is how can I easily map between the new line numbers which the eslint will give in it's error messages to the relative line numbers required by the github API
What I have tried so far
I am using parse-diff to convert the diff provided by github API into json object
[{
"chunks": [{
"content": "## -,OLD_TOTAL_LINES +NEW_STARTING_LINE_NUMBER,NEW_TOTAL_LINES ##",
"changes": [
{
"type": STRING("normal"|"add"|"del"),
"normal": BOOLEAN,
"add": BOOLEAN,
"del": BOOLEAN,
"ln1": OLD_LINE_NUMBER,
"ln2": NEW_LINE_NUMBER,
"content": STRING,
"oldStart": NUMBER,
"oldLines": NUMBER,
"newStart": NUMBER,
"newLines": NUMBER
}
}]
}]
I am thinking of the following algorithm
make an array of new line numbers starting from NEW_STARTING_LINE_NUMBER to
NEW_STARTING_LINE_NUMBER+NEW_TOTAL_LINESfor each file
subtract newStart from each number and make it another array relativeLineNumbers
traverse through the array and for each deleted line (type==='del') increment the corresponding remaining relativeLineNumbers
for another hunk (line having ##) decrement the corresponding remaining relativeLineNumbers
I have found a solution. I didn't put it here because it involves simple looping and nothing special. But anyway answering now to help others.
I have opened a pull request to create the similar situation as shown in question
https://github.com/harryi3t/5134/pull/7/files
Using the Github API one can get the diff data.
diff --git a/test.js b/test.js
index 2aa9a08..066fc99 100644
--- a/test.js
+++ b/test.js
## -2,14 +2,7 ##
var hello = require('./hello.js');
-var names = [
- 'harry',
- 'barry',
- 'garry',
- 'harry',
- 'barry',
- 'marry',
-];
+var names = ['harry', 'barry', 'garry', 'harry', 'barry', 'marry'];
var names2 = [
'harry',
## -23,9 +16,7 ## var names2 = [
// after this line new chunk will be created
var names3 = [
'harry',
- 'barry',
- 'garry',
'harry',
'barry',
- 'marry',
+ 'marry', 'garry',
];
Now just pass this data to diff-parse module and do the computation.
var parsedFiles = parseDiff(data); // diff output
parsedFiles.forEach(
function (file) {
var relativeLine = 0;
file.chunks.forEach(
function (chunk, index) {
if (index !== 0) // relative line number should increment for each chunk
relativeLine++; // except the first one (see rel-line 16 in the image)
chunk.changes.forEach(
function (change) {
relativeLine++;
console.log(
change.type,
change.ln1 ? change.ln1 : '-',
change.ln2 ? change.ln2 : '-',
change.ln ? change.ln : '-',
relativeLine
);
}
);
}
);
}
);
This would print
type (ln1) old line (ln2) new line (ln) added/deleted line relative line
normal 2 2 - 1
normal 3 3 - 2
normal 4 4 - 3
del - - 5 4
del - - 6 5
del - - 7 6
del - - 8 7
del - - 9 8
del - - 10 9
del - - 11 10
del - - 12 11
add - - 5 12
normal 13 6 - 13
normal 14 7 - 14
normal 15 8 - 15
normal 23 16 - 17
normal 24 17 - 18
normal 25 18 - 19
del - - 26 20
del - - 27 21
normal 28 19 - 22
normal 29 20 - 23
del - - 30 24
add - - 21 25
normal 31 22 - 26
Now you can use the relative line number to post a comment using github api.
For my purpose I only needed the relative line numbers for the newly added lines, but using the table above one can get it for deleted lines also.
Here's the link for the linting project in which I used this. https://github.com/harryi3t/lint-github-pr

If Expressions Don't Work

I'm having a problem with getting if expressions to work. The problem is that the first if statement is used, even if the expression beside should return a value of FALSE.
For example, when I run this script, it should be that %X% would have a value of 10 by the time it has run twice. (First is 5, second is 10). %length% would then incidentally have a value of 2 on the second run.
The message box that I get on the second run says "The length of InputVar is 2. - One - 2" all the way up through the 19th run which says "The length of InputVar is 2. - One - 19". Then when it hits runs 20 (through 22), it says "The length of InputVar is 3. - One - 20".
What am I doing wrong?
^1::
X:=0
Y:=0
Loop, 22
{
Y:=++Y
X:=5+X
InputVar:=X
StringLen, length, InputVar
if (%length%<2)
{
MsgBox, 1, Length, The length of InputVar is %length%. - One - %Y%, 2
}
else if (%length%==2)
{
MsgBox, 1, Length, The length of InputVar is %length%. - Two - %Y%, 2
}
else if (%length%>2)
{
MsgBox, 1, Length, The length of InputVar is %length%. - Three - %Y%, 2
}
else
{
MsgBox, 1, Length, The length of InputVar is %length%. - Unknown - %Y%, 2
}
Sleep 500
}
Return
;These are the written numbers I should expect to be paired up with %Y%.
;One - 1
;Two - 2-19
;Three - 20-22
I edited the original post. Check the first line.
(Solved! I needed to get rid of the % in %length% when comparing it against the value of 2.)
Edit: And, BGM, did you know you have to wait a day to mark your own answer as the correct answer? I won't be back tomorrow :)