Determine if string exists in file - autohotkey

I have a list of strings such as:
John
John Doe
Peter Pan
in a .txt file.
I want to make a loop that checks if a certain name exists. However, I do not want it to be true if I search for "Peter" and only "Peter Pan" exists. Each line has to be a full match.

Ha ha, ep0's answer is very sophisticated!
However, you want to use a parsing loop something like this (this example expects that your names are separated by carriage returns). Consider that you have a text file with contents arranged like this:
John
Harry
Bob
Joe
Here is your script:
fileread, thistext, %whatfile% ;get the text from the file into a variable
;Now, loop through each line and see if it matches your results:
loop, parse, thistext, `r`n, `r`n
{
if(a_loopfield = "John")
msgbox, Hey! It's John!
else
msgbox, No, it's %a_loopfield%
}
If your names are arranged in a different order, you might have to either change the delimiter for the parsing loop, or use regex instead of just a simple comparison.

If you want to check for multiple names use a trie. If you have just one name, you can use KMP.
I'll explain this for multiple names you want to check that exist, since for only one, the example provided on Wikipedia is more than sufficient and you can apply the same idea.
Construct the said trie from your names you want to find, and for each line in file, traverse the trie character by character until you hit a final node.
BONUS: trie is used by Aho-Corasick algorithm, which is an extension of KMP to multiple patters. Read about it. It's very worthwhile.
UPDATE:
For checking if a single name exists, hash the name you want to find, then read the text file line by line. For each line, hash it with the same function and compare it to the one you want to find. If they are equal, compare the strings character by character. You need to do this to avoid false positives (see hash collisions)

Related

Using Perl to parse text from blocks

I have a file with multiple blocks of test. FOR EACH block of test, I want to be able to extract what is in the square bracket, the line containing the FIRST instance of the word "area", and what is on the right of the square bracket. Everything will be a string. Essentially what I want to do is store each string into a variable in a hash so i can print it into a 3 column csv file.
Here's a sample of what the file looks like:
Student-[K-6] Exceptional in Math
/home/area/kinder/mathadvance.txt, 12
Students in grade K-12 shown to be exceptional in math.
Placed into special after school program.
See /home/area/overall/performance.txt, 200
Student-[Junior] Weak Performance
Students with overall weak performance.
Summer program services offered as shown in
"/home/area/services/summer.txt", 212
Student-[K-6] Physical Excerise Time Slots
/home/area/pe/schedule.txt, 303
Assigned time slots for PE based on student's grade level. Make reference to
/home/area/overall/classtimes.txt, 90
I want to to have a final csv file that looks like:
Grade,Topic,Path
K-6, Exceptional in Math, /home/area/kinder/mathadvance.txt, 12
K-6, Physical Exercise Time Slots, /home/area/pe/schedule.txt, 303
Junior, Weak Performance, "/home/area/services/summer.txt", 212
Since it's a csv file, I know it will also separate at the line number when exporting into excel but I'm fine with that.
I started off by putting the grade type into an array because I want to be able to add more strings to it for different grade levels.
My program looks like this so far:
#!/usr/bin/perl
use strict;
use warnings;
my #grades = ("K-6", "Junior", "Community-College", "PreK");
I was thinking that I will need to do some sort of system sed command to grab what is in the brackets and store it into a variable. Then I will grab everything to the right of the bracket on the line and store it into a variable. And then I will grep for a line containing "area" to get the path and I will store it as a string into a variable, put these in a hash, and then print into csv. I'm not sure if I'm thinking about this the right way. Also, I have NO IDEA how to do this for each BLOCK of text in the file. I need it by block because each block has its own corresponding grades, topics, and paths.
perl -000 -ne '($grade, $topic) = /\[(.*)\] (.*)/;
($path) = m{(.*/area/.*)};
print "$grade, $topic, $path\n"' -- file.txt
-000 turns on paragraph mode, -n won't read line by line, but paragraph by paragraph
/\[(.*)\] (.*)/ matches the square brackets and whatever follows them up to a newline. The inside of the square brackets and the following text are captured using the parentheses.
m{(.*/area/.*)} captures the line containing "area". It uses the m{} syntax instead of // so we don't have to backslash the slashes (avoiding so called "leaning toothpick syndrome")

Extract Data From Second Line of Output

I have a table that contains message_content. It looks like this:
message_content | WFUS54 ABNT 080344\r\r
| TORLCH\r\r
| TXC245-361-080415-\r
How would I extract only the 2nd line of that output(TORLCH)? I've tried to shorten the output to a certain number of characters but that ultimately doesn't provide what I want. I've also tried removing carriage returns and new lines. I am outputting my results to a CSV I could manipulate with Python, but was wondering if there's a way to do it in the query first.
Based on other examples, it seems like I could use a regular expression to maybe do this? Not sure where to start with learning that though.
you can split the line into an array, then take the second element:
(string_to_array(message_content, e'\r\r'))[2]
Online example: https://rextester.com/MDYLXB40812

getting the path name files in a directory in q

I am using q to get all the files listed in that directory:
key `:Dname
and then try to filter out the the ones that start with numbers as:
key `:Dname like "[0-9]"
but the like part does not quite work. I tried get as well since I like the path to include the directory that the file is in.
Keep in mind that q evaluate expressions from right to left. Your code here will first evaluate
`:Dname like "[0-9]"
and apply key to the result.
You want something closer to
key[`:Dname] like "[0-9]"
But to get what you want you'll have to add a wildcard to the pattern string that you're supplying and apply not to the result
not key[`:Dname] like "[0-9]*"
This will give you a boolean vector, to return the list of files you want use where and index:
key[`:Dname] where not key[`:Dname] like "[0-9]*"
If you have a filename defined like
filename:`2019.01.20file.txt
You can compare this to a pattern using like, similar to what you have done:
filename like "[0-9]*"
"*" is the wildcard symbol which means that anything can come after the [0-9]
like compares a string or symbol to a pattern
So this line return a 1b if the filename starts with a digit between 0 and 9.
Another method would be to compare the start of the filename to .Q.n which is a string of 0-9.
This can be achieved like so:
first[string filename] in .Q.n
string converts the symbol to a string for in to compare it to the string .Q.n
For your situation, I would recommend the first method.
q)key `:q
`README.txt`q.k`q.q`s.k`sp.q`w32
q)key[`:q] like "q*"
011000b
q)x where (x:key[`:q]) like "q*"
`q.k`q.q
q)x where not (x:key[`:q]) like "q*"
`README.txt`s.k`sp.q`w32
This method returns the Boolean list which indicates whether each file starts with "q":
Uses not to reverse the 1s and 0s of this list
Uses where to return the indexes at which the Boolean list is equal to 1
Indexes into key[`:q] with this list
I hope this helps

Matlab - Help in listing files using a name-pattern

I'm trying to create a function that lists the content of a folder based on a pattern, however the listing includes more files than needed. I'll explain by an example: Consider a folder containing the files
file.dat
file.dat._
file.dat.000
file.dat.001
...
file.dat.999
I am interested only in the files that are .000, .001 and so on. The files file.dat and file.dat._ are to be excluded.
The later numbering can also be .0000,.0001 and so on, so number of digits is not necessarily 3.
I tried using the Dir command with the pattern file.dat.* - this included file.dat for some reason (Why the last comma treated differently?) and file.dat._, which was expected.
The "obvious" set of solutions is to add an additional regular expression or length check - however I would like to avoid that, if possible.
This needs to work both under UNIX and Windows (and preferably MacOS).
Any elegant solutions?
Get all filenames with dir and filter them using with the regex '^file\.dat\.\d+$'. This matches:
start of the string (^)
followed by the string file.dat. (file\.dat\.)
followed by one or more digits (\d+)
and then the string must end ($)
Since the output of dir is a cell array of char vectors, regex returns a cell array with the matching indices of each char vector. The matching indices can only be 1 or [], so any is applied to each cell's content to reduce it to true or false The resulting logical index tells which filenames should be kept.
f = dir('path/to/folder');
names = {f.name};
ind = cellfun(#any, regexp(names, '^file\.dat\.\d+$'));
names = names(ind);

Substitute only one part of a string using perl

I have an array that have some symbols that I want to remove and even thought I find a solution, I will like to know if this is the right way because I'm afraid if I use it with array will remove the character that I might need on future arrays.
Here is an example item on my array:
$string1='22 | logging monitor informational';
so I try the following:
$string1=~ s/\s{6}\|(?=\s{6})//;
So my output is:
22 logging monitor informational
Is the other way that best match "|". I just want to remove the pipe character.
Thanks in advance
"I want to remove just the pipe character."
OK, then do this:
$string1 =~ s/\|//;
This will remove the first pipe character in the string. (You said in another comment that you don't want to remove any additional pipe characters.) If that's not what you want, then I'd suggest telling us exactly what you do want. We can't read minds, you know.
In the mean time, I'd also strongly recommend reading the Perl regular expressions tutorial.