How to create an index that shows which line a word is on, preferably using enumerate? - numbers

filename= input('please give name of file', )
lines= open(filename).readlines()
for i,line in enumerate(lines,start=1,):
print(str(i),str(line))
i've numbered the lines of the text document
How do i create another index which shows each word and on which line it appears?
it should look like this:
numbered lines in text document below:
1)test
2)this
3)this
4)this
5)dog
6)dog
7)cat
8)cat
9)hamster
10)hamster
# i'm struggling to make this output:
index:
this [2,3,4]
test [1]
dog [5,6]
cat [7,8]
hamster [9,10]

This is a job for a dictionary, where you can store the line numbers for each word easily:
index = dict()
for i, line in enumerate(lines, 1):
if not line in index: index[line] = []
index[line].append(i)
for word in index: print(word.strip(), index[word], sep='\t')
the numbers (which represent line numbers) should be contained in the text document.
In this case we have to separate the number from the word:
index = dict()
for line in lines:
i, word = line.strip().split(')')
if not word in index: index[word] = []
index[word].append(int(i))
for word in index: print(word, index[word], sep='\t')

Related

How to search roman numerals in string with dart language?

I'm new to flutter. i have a string which have some roman numbers that indicate steps.. so i want to arrange that steps with new line and don't know how to do it. i tried string.replaceAll() but cannot get it since there are many roman number and some of then are the same such as i and ii. for example i have this string..
String text = 'some text here i) some step 1 ii) some step 2 iii) some step 3. some text after step blabla '
I want the output to have '\n' infront of the roman number which will arrange the output
some text here
i) some step 1
ii) some step 2
iii) some step 3
some text after step blabla
is there something i can use to detect the numeral numbers and add '\n' infront of it in the string or is there some other way.
There is probably a simpler way to do this with some sort of regex that I don't know about, but something like this should work.
String text = 'Your String'
List<String> words = text.split(' ');
String result = '';
for(var word in words){
if(word.endsWith(')')){
result += word + '\n';
}else{
result += word
}
}
//result now contains the desired string
Tricky to know when "some text after step blabla" should have line breaks in front because I don't know what to look for... That you would have to specify more closely.
Using the Numerus package to check for valid Roman Numeral.
Have this a go:
String text = 'some text here i) some step 1 ii) some step 2 iii) some step 3. ao) invalid roman numberal. some textafter';
RegExp regexp = new RegExp(r"((\w+)\))");
final stringWithLinebreaks = text.replaceAllMapped(regexp, (match) {
return match.group(2).isValidRomanNumeral()
? '\n${match.group(1)}'
: '${match.group(1)}';
});
print(stringWithLinebreaks);
That will print out:
some text here
i) some step 1
ii) some step 2
iii) some step 3. ao) invalid roman numberal. some textafter
You could of course make it better in several ways. Such as converting the roman numeral to int value using toRomanNumeralValue(), then sorting the steps accordingly if they are in the incorrect order in the text string. You could also make the RegExp more precise in several ways. For instance replacing \w with [iIvVxXlLcCdDmM] and so on..
i've found the answer from javascript and modify some of it... this is my answer... but i managed to search roman number from 1 to 10 only.. but its ok since the step never exceed more than 10
String formatText(String text) {
RegExp regExp = new RegExp(r'(?:viii?|i(?:ii?|[vx])?|vi?|x)\)');
String value = text.replaceAllMapped(regExp, (match) => "\n${match.group(0)}");
return value;
}

How do I find letters in words that are part of a string and remove them? (List comprehensions with if statements)

I'm trying to remove vowels from a string. Specifically, remove vowels from words that have more than 4 letters.
Here's my thought process:
(1) First, split the string into an array.
(2) Then, loop through the array and identify words that are more than 4 letters.
(3) Third, replace vowels with "".
(4) Lastly, join the array back into a string.
Problem: I don't think the code is looping through the array.
Can anyone find a solution?
def abbreviate_sentence(sent):
split_string = sent.split()
for word in split_string:
if len(word) > 4:
abbrev = word.replace("a", "").replace("e", "").replace("i", "").replace("o", "").replace("u", "")
sentence = " ".join(abbrev)
return sentence
print(abbreviate_sentence("follow the yellow brick road")) # => "fllw the yllw brck road"
I just figured out that the "abbrev = words.replace..." line was incomplete.
I changed it to:
abbrev = [words.replace("a", "").replace("e", "").replace("i", "").replace("o", "").replace("u", "") if len(words) > 4 else words for words in split_string]
I found the part of the solution here: Find and replace string values in list.
It is called a List Comprehension.
I also found List Comprehension with If Statement
The new lines of code look like:
def abbreviate_sentence(sent):
split_string = sent.split()
for words in split_string:
abbrev = [words.replace("a", "").replace("e", "").replace("i", "").replace("o", "").replace("u", "")
if len(words) > 4 else words for words in split_string]
sentence = " ".join(abbrev)
return sentence
print(abbreviate_sentence("follow the yellow brick road")) # => "fllw the yllw brck road"

Remove white spaces in scala-spark

I have sample file record like this
2018-01-1509.05.540000000000001000000751111EMAIL#AAA.BB.CL
and the above record is from a fixed length file and I wanted to split based on the lengths
and when I split I am getting a list as shown below.
ListBuffer(2018-01-15, 09.05.54, 00000000000010000007, 5, 1111, EMAIL#AAA.BB.CL)
Everything looks fine until now . But I am not sure why is there extra-space adding in each field in the list(not for the first field).
Example : My data is "09.05.54",But I am getting as" 09.05.54" in the list.
My Logic for splitting is shown below
// Logic to Split the Line based on the lengths
def splitLineBasedOnLengths(line: String, lengths: List[String]): ListBuffer[Any] = {
var splittedLine = line
var split = new ListBuffer[Any]()
for (i <- lengths) yield {
var c = i.toInt
var fi = splittedLine.take(c)
split += fi
splittedLine = splittedLine.drop(c)
}
split
}
The above code take's the line and list[String] which are nothing but lengths as input and gives the listbuffer[Any] which has the lines split according to the length.
Can any one help me why am I getting extra space before each field after splitting ?
There are no extra spaces in the data. It's just adding some separation between the elements when printing them (using toString) to make them easier to read.
To prove this try the following code:
split.foreach(s => println(s"\"$s\""))
You will see the following printed:
"2018-01-15"
"09.05.54"
"00000000000010000007"
"5"
"1111"
"EMAIL#AAA.BB.CL"

Python cut words in list after certain length

I have a List of entries in the csv file records. I want to limit the length of the elements to 50 characters and save it into the list. My approach does not work.
def readfile():
records = []
with open(fpath, 'r') as csvfile:
csvreader = csv.reader(csvfile, delimiter='|')
for row in csvreader:
if len(row) == 0:
continue
records.append([row[1]] + [x.strip() for x in row[3]])
return records
def cut_words(records):
for lines in records:
for word in lines:
word = word[0:50]
return records
it does not seem to be saved in the list.... thanks

Delete first two characters of an alphanumeric sequence in a column

I'm trying to scan a column for character size. If the alphanumerical character size is met (qty 12), then the first two characters will be deleted. They will always be specific numbers (10). See below.
H063088955
F243066424
10G403085387
F253066457
E473057375
G503087343
10H303098124
G093075912
G433084322
10G403085388
Select the cells you wish to process and run:
Sub qwerty()
For Each r In Selection
v = r.Text
If Len(v) = 12 Then
r.Value = Mid(v, 3)
End If
Next r
End Sub