How to get a substring that ends before a certain symbol - substring

One of the variables I have in the file has the following format:
Bachelor of Commerce - AD - Accounting-Maj
Bachelor of Commerce - Finance-Maj
Bachelor of Commerce - Finance-Maj/Accounting-Min
BSc with Specialization - Math & Finance-Maj
BSc in Agric/Food Bus Mngmnt - Agric Business Management-Maj
Bachelor of Commerce - Management Info Systems-Maj
What I would like to do, is to take the first part of the string before the - symbol.
For example, from the first three lines I need to get Bachelor of Commerce.
I would appreciate if somebody could tell me the easiest way to do it.

Try this, assuming your variable is named string_var:
split string_var, parse(" -") limit(1) gen(substring_before_first_hyphen)

For future questions, please post attempted code and why it's not working for you. Questions asking only for code are deemed off-topic by some users.
Here is one way:
clear all
set more off
*----- example data -----
set obs 2
gen degree = "Bachelor of Commerce - AD - Accounting-Maj"
replace degree = "Bachelor of Something" in 2
list
*----- what you want -----
gen degree2 = trim(substr(degree, 1, strpos(degree, "-") - 1))
replace degree2 = degree if missing(degree2)
list
This takes the substring of variable degree starting in position 1, and ending in the position (minus 1) in which the first - is found. trim() will trim any leading or trailing blanks. If there is no - in the original variable, a missing will be generated so a replace is in place.
See help string functions for an array of functions that can be used to manipulate strings.

Previous answers using substring and split are probably better in Stata. I am posting a regular expression solution just for completeness
clear
input strL degree
"Bachelor of Commerce - AD - Accounting-Maj"
"Bachelor of Commerce - Finance-Maj"
"Bachelor of Commerce - Finance-Maj/Accounting-Min"
"BSc with Specialization - Math & Finance-Maj"
"BSc in Agric/Food Bus Mngmnt - Agric Business Management-Maj"
"Bachelor of Commerce - Management Info Systems-Maj"
end
gen str=regexs(0) if regexm(degree,"^[^\-]*")==1
list str

String course = Bachelor of Commerce - AD - Accounting-Maj;
if you want to get subString of before '-' character use below line
String requiredSubString = course.split("-")[0];
in above code split method returns array of stings, which is separated by '-' character.Then you can get required sub String by its index. so here we are getting 0 index string separated by - character .
i.e Bachelor of Commerce

One could also use the egen command with its ends() function and the associated punct option:
clear
input strL string
"Bachelor of Commerce - AD - Accounting-Maj"
"Bachelor of Commerce - Finance-Maj"
"Bachelor of Commerce - Finance-Maj/Accounting-Min"
"BSc with Specialization - Math & Finance-Maj"
"BSc in Agric/Food Bus Mngmnt - Agric Business Management-Maj"
"Bachelor of Commerce - Management Info Systems-Maj"
end
egen new_string = ends(string), punct(-)
list new_string
+-------------------------------+
| new_string |
|-------------------------------|
1. | Bachelor of Commerce |
2. | Bachelor of Commerce |
3. | Bachelor of Commerce |
4. | BSc with Specialization |
5. | BSc in Agric/Food Bus Mngmnt |
|-------------------------------|
6. | Bachelor of Commerce |
+-------------------------------+

Related

open api 3 add multiple random examples for a single string item

I have a text.yaml - a single string item.
I want it to pass random string examples to my documentation to represent real life cases.
type: string
example:
- Example Text
- טקסט לדוגמא
- 123 text with numbers 456
description: localized text to show
tried with examples but it's not supported.
this doc - https://swagger.io/docs/specification/2-0/adding-examples/ does not list any valid parameters for this

Scala split on \n

I have a text file with following contents-
"\n\n\n\n\n\n\n\n\t\n\t\t\t\n\t\t\t\t\t\n\t\t\t\t\n\t\t\n\n\n\t\n\t\t\n\t\t\t\t
Hotline: +49 40-300 51 701\n\t\n\t\n\t
Languages\n\t\n\t\n\t\t\n\t\t\n\t\t
Travel plan \n\t\n\t\n\n\n\n\t\t\n\n\t\t\n\t\t\t\n\n\n\n\n\n\n\n\n\n\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\n\n\t\t\n\n\t\t\n\t\t\t\t
Book\t
Packages from € 59\n
\tAccommodation and arrival\n
\tMusical packages\n
\tMaritime packages\n\t
Hamburg for Families\n\t
Experience Hamburg & Culture\n\n\n\n\n\t
Hotels from € 24\n\t
Book online now!\n\t
Theme hotels\n\t
Hotels by location\n\t
Special Offers\n\t
Hotels from A-Z\n\t
Other accommodation\n\n\n\n\n\t
Tickets from € 8\n\tBook online now!\n\t
Musicals Hamburg\n\tHamburg maritime\n\t
Sightseeing tours & city walks\n\tMuseums & Exhibitions\n\tHamburg for Families\n\n\n\n\n\t
Hamburg CARD\n\tBook online now!\n\tAll benefits at a glance\n\tFrequently asked questions\n\n\n\n\n\t
Group trips\n\tBooking request\n\tHamburg Guides and theme walks\n\n\n\n\n\n\n\t\n\t\tOffer\n\n\t\t\n\n\t\t\n\n\t\t
Hamburg CARD\n\t\tFree travel by bus, rail and ferry (HVV) and up to 50% discount on more than 150 tourist...\n\n\t\n\t\n\t\t\n\t\t\t\n\t\t\t\t
from 10,50 EUR\n\t\t\t\n\t\t\n\n\t\n\n\n\n\n\n\n\tAttractions\tBest of Hamburg\n\t
Town Hall\n\tThe \"Michel\"\n\tSt. Pauli & Reeperbahn\n\t
Elbphilharmonie\n\tJungfernstieg\n\tMiniatur Wunderland\n\tTierpark Hagenbeck\n\t
All about the Alster\n\tBlankenese\n\n\n\n\n\tHamburg Maritime\n\t
Urbanshore Hamburg\n\tPort of Hamburg\n\tLandungsbrücken\n\tFish Market\n\tSpeicherstadt\n\tOn the Elbe\n\tHafenCity\n\tWillkomm-Höft\n\tÖvelgönne\n\n\n\n\n\tHistoric Hamburg\n\tThe Old Elbe Tunnel\n\t"
I want to split it on the \n. I tried
string.split("\n")
string.split('\n')
string.split("""\n""")
string.split("\\n")
Nothing of this seems to work. How do I get it done in scala?
Split by \n, then \t, flatten, then remove empty strings.
var lines = Source.fromFile("/Users/rasika/Documents/example.txt").getLines.mkString
val result = lines.split("\\\\n").flatMap(_.split("\\\\t")).filter(_.nonEmpty).toList
Result
Hotline: +49 40-300 51 701
Languages
Travel plan
Book
Packages from € 59
Accommodation and arrival
Musical packages
Maritime packages
Hamburg for Families
Experience Hamburg & Culture
Hotels from € 24
Book online now!
Theme hotels
Hotels by location
Special Offers
Hotels from A-Z
Other accommodation
Tickets from € 8
Book online now!
Musicals Hamburg
Hamburg maritime
Sightseeing tours & city walks
Museums & Exhibitions
Hamburg for Families
Hamburg CARD
Book online now!
All benefits at a glance
Frequently asked questions
Group trips
Booking request
Hamburg Guides and theme walks
Offer
Hamburg CARD
Free travel by bus, rail and ferry (HVV) and up to 50% discount on more than 150 tourist...
from 10,50 EUR
Output exceeds cutoff limit.
If you want to split on literal \n in your text (i.e. literal text, and not just a newline), then try this:
string.split("\\\\n")
In a regex context in Java/Scala, a literal backslash requires four backslashes.
Since you're splitting on newlines, and io.Source.fromFile.getLines separates on newlines, you'll need to read the whole file in one go instead, with
val string = io.Source.fromFile(filepath).mkString
as per this answer. Then your attempts should work e.g.
string.split('\n')

RASA: Automatic generation of stories

I want to know whether it is possible to automatically generate stories to train the chatbot using RASA.
I have built my training data using the online training session to generate the stories and I found it very impractical. I would like to know whether there is some automatic way to do the story conversion of any conversation.
You can use interactive learning to create training data while speaking to the bot. See the docs here for more information on how to use interactive learning.
This is probably too late of a response but it is for the rest of the community who might still need an answer to this. A few months ago, I personally experienced that it would take too long to conduct interactive learning to train your RASA bot (especially when you have hundreds of intents and actions).
What I did was:
1) Compile all my named intents and actions(utter_) into a .csv file with following column headers : [subject] | [intent_name] | [utter_name]
2) Parse the intent name along with its respective action name in the same row into Markdown(.md) format while adding the respective strings to follow the formatting of the stories.md file
Package this into a function and call in a for loop:
subcell = "## " + column[0] + "\n" # [subject] header for stories
subcell += "* " + column[1] + "\n" + " - " + column[2] + "\n" + "\n" # [intent_name] + [utter_name]
3) The generated stories.md output will have all the direct intent-action pairings for your simple conversation flows. Likewise, you can apply this concept for generating domain.yml
Hope this helps!

iab.taxonome.org error code -5

i'm trying to use iab.taxonome.org service to classify texts, and get error response -5 (text too short)
Here is what i'm sending to the service:
https://rest.taxonome.org/v1/taxono?me=A college basketball game at Allen Fieldhouse, in Lawrence, Kansas, the home of the Kansas Jayhawks
The history of basketball is traced back to a YMCA International Training School, known today as Springfield College, located in Springfield, Massachusetts&token=[...MyKey...]&ver=1
Indeed I had the same issue. After clearing this with taxonome support team I figure out there is a requirement for at least 500 words per classification.
I have asked to add it to the API reference page.
Double checking and editing my answer: It is depends which framework is being used to send this data. In case you are implementing the client and not encoding the URL string it won't work for you (e.g. space = %20).
Check the API example here:
https://iab.taxonome.org/api

Odoo / Postgres: Concatenate rows value until condition = true

I'm creating a new view for a customized PoS module for my restaurant. It's a specific view for kitchen, already filtered to show just kitchen's related items of a PoS order.
Now I would like to show a particular series of product with the name starting with "#" in a field populated by a function side by side with the "standard" products. In fact I'm speaking of a set of instructions coded as products to be available on PoS and to suit my needs I decided to create kitchen instructions as a product (service). In my PoS there are 3 categories that are not products but instructions (ADD, SUBTRACT, ADDITIONAL NOTE).
A typical order will show as:
Product Quantity
[1007] Pom. Secchi 1
# + Búfala 1
# - Mussarela 1
[2002] Nutella banana 1
# + Mussarela 1
What I would like to archive is a view like:
Product Note Quantity
[1007] Pom. Secchi # + Búfala # - Mussarela 1
[2002] Nutella banana # + Mussarela 1
I need to select just the strings starting with the "#" between two internal code starting with "[code]" and this is the hard thing to do. I don't need to delete the values I "move" because I need the stock view when I print the receipt.
I will call this function using python (and psycopg2 on postgresql 9.4) with on_change or extending the workflow when an order is marked as "paid". I don't think if doing it directly in Python is viable.
I've tried with string_agg(,) but with bad results.
Any tips about what should be the correct path to start coding? Thanks.