I have a question: I have download 1 rainfall data int Text file *.txt it included texts, heading, bottom text, and data also some of spaces line between data.
When I import the file into the Matlab, The matlab could not defined each cells included column, and rows (non delimiter)
I have did like, convert text file to Excel, and remove the cols and rows very easily and save with another files. But my data is up to 846000 data set ~ 24h x 30days x 12month x 20 years which combined many different files for each data. SO it will difficult to make manual converting like I did.
My adviser told me that there are Matlab CODE could do it well. Does anyone can help me this problem?
The original: https://drive.google.com/file/d/0By5tEg03EXCpekNaemItMF85ZWs/edit?usp=sharing
If you're on Mac or Linux I suggest converting these data files using the shell into a format Matlab will like rather than trying to make Matlab do it. This works on Windows too, but only if you have a unix-like shell installed such as MinGW, Cygwin or Git Bash.
For example, this converts the raw data section of the file you shared into CSV:
cat "$file" | sed 's: *:,:g' | sed 's:^,::' | grep '^[0-9]' > "$file".csv
You could then loop through all your raw data files and combine them into a single CSV like this:
for file in *.txt; do
cat "$file" | sed 's: *:,:g' | sed 's:^,::' | grep '^[0-9]' >> all.csv
done
If you need to preserve, for example, which year and which weather station, you could get a little fancier with it and capture those values at the beginning of each file and turn them into columns on each line. Here's an example that grabs the year and weather station ID and inserts it as a column before each day.
for file in *.txt; do
station="$(grep 'Station -' "$file" | sed 's: *Station - ::' | sed 's: .*::' | uniq)"
year="$(grep 'Water Year' "$file" | awk '{print $4}')"
cat "$file" | sed 's: *:,:g' | grep '^,[0-9]' |\
sed "s/^,/$station,$year,/" >> all.csv
done
Related
I have a file, ABDC.DELTA00.TS.D20161022.TS_BAR99.DAT.DOCC.
I want to cut the text between two strings: the first TS and DOCC. I tried
efvar4=$(echo $filename | sed -n "s/.*TS//;s/DOCC.*//p")
resulting in _BAR99.DAT – matching the second TS in the filename.
Desired result: .TS.D20161022.TS_BAR99.DAT.
How do I modify my sed command to achieve the desired result?
echo "ABDC.DELTA00.TS.D20161022.TS_BAR99.DAT.DOCC" | sed 's/^.*\.TS\./.TS./;s/\.DOCC/./'
I have 4 text file in which each file has a single column of data (~2000 lines in each file). What I am trying to do, is to compare all of the files, and determine what is the overlap between the different files. So, I would want to know what is in file1 but not the other 3 files, and what is in file2 but not in the other 3, what is in file1 and file2 only, etc. The ultimate goal is to make a venn diagram with 4 overlapping circles showing the various overlaps between the files.
I have been raking my brain trying to figure out how to do this. I have been playing with the comm and diff commands but am having trouble doing this with all of the files. Would anyone have any suggestions on how to do this?
Thanks for any help or suggestions.
Assuming 4 files named a b c d
lines existing in file a but not in any of the others (I assume ^ is a char not used in any of the files):
for l in `cat a | sort | uniq`;do echo $l^`grep -c $l b c d`;done | grep 'b:0 c:0 d:0$' | cut -d\^ -f1
lines existing in all of them:
for l in `cat a | sort | uniq`;do echo $l^`grep -c $l b c d`;done | grep 'b:[1-9]* c:[1-9]* d:[1-9]*$' | cut -d\^ -f1
...
I am doing some calculations using gaussian. From the gaussian output file, I need to extract the input structure information. The output file contains more than 800 structure coordinates. What I did so far is, collect all the input coordinates using some combinations of the grep, awk and sed commands, like so:
grep -A 7 "Input orientation:" test.log | grep -A 5 "C" | awk '/C/{print "structure number"}1' | sed '/--/d' > test.out
This helped me to grep all the input coordinates and insert a line with "structure number". So now I have a file that contains a pattern which is being repeated in a regular fashion. The file is like the following:
structure Number
4.176801 -0.044096 2.253823
2.994556 0.097622 2.356678
5.060174 -0.115257 3.342200
structure Number
4.180919 -0.044664 2.251182
3.002927 0.098946 2.359346
5.037811 -0.103410 3.389953
Here, "Structure number" is being repeated. I want to write a number like "structure number:1", "structure number 2" in increasing order.
How can I solve this problem?
Thanks for your help in advance.
I am not familiar at all with a program called gaussian, so I have no clue what the original input looked like. If someone posts an example I might be able to give an even shorter solution.
However, as far as I got it the OP is contented with the output of his/her code besided that he/she wants to append an increasing number to the lines inserted with awk.
This can be achieved with the following line (adjusting the OP's code):
grep -A 7 "Input orientation:" test.log | grep -A 5 "C" | awk '/C/{print "structure number"++i}1' | sed '/--/d' > test.out
Addendum:
Even without knowing the actual input, I am sure that one can at least get rid of the sed command leaving that piece of work to awk. Also, there is no need to quote a single character grep pattern:
grep -A 7 "Input orientation:" test.log | grep -A 5 C | awk '/C/{print "structure number"++i}!/--/' > test.out
I am not sure since I cannot test, but it should be possible to let awk do the grep's work, too. As a first guess I would try the following:
awk '/Input orientation:/{li=7}!li{next}{--li}/C/{print "structure number"++i;lc=5}!lc{next}{--lc}!/--/' test.log > test.out
While this might be a little bit longer in code it is an awk-only solution doing all the work in one process. If I had input to test with, I might come up with a shorter solution.
I have a text file which looks something like this:
jdkjf
kjsdh
jksfs
lksfj
gkfdj
gdfjg
lkjsd
hsfda
gadfl
dfgad
[very many lines, that is]
but would rather like it to look like
jdkjf kjsdh
jksfs lksfj
gkfdj gdfjg
lkjsd hsfda
gadfl dfgad
[and so on]
so I can print the text file on a smaller number of pages.
Of course, this is not a difficult problem, but I'm wondering if there is some excellent tool out there for solving problems like these.
EDIT: I'm not looking for a way to remove every other newline from a text file, but rather a tool which interprets text as "pictures" and then lays these out on the page nicely (by writing the appropriate whitespace symbols).
You can use this python code.
tables=input("Enter number of tables ")
matrix=[]
file=open("test.txt")
for line in file:
matrix.append(line.replace("\n",""))
if (len(matrix)==int(tables)):
print (matrix)
matrix=[]
file.close()
(Since you don't name your operating system, I'll simply assume Linux, Mac OS X or some other Unix...)
Your example looks like it can also be described by the expression "joining 2 lines together".
This can be achieved in a shell (with the help of xargs and awk) -- but only for an input file that is structured like your example (the result always puts 2 words on a line, irrespective of how many words each one contains):
cat file.txt | xargs -n 2 | awk '{ print $1" "$2 }'
This can also be achieved with awk alone (this time it really joins 2 full lines, irrespective of how many words each one contains):
awk '{printf $0 " "; getline; print $0}' file.txt
Or use sed --
sed 'N;s#\n# #' < file.txt
Also, xargs could do it:
xargs -L 2 < file.txt
I'm sure other people could come up with dozens of other, quite different methods and commandline combinations...
Caveats: You'll have to test for files with an odd number of lines explicitly. The last input line may not be processed correctly in case of odd number of lines.
I'm listing just the file basenames with an ls command like this, which I got from here:
ls --color -1 . | tr '\n' '\0' | xargs -0 -n 1 basename
I would like to list all the directories in the first column, all the executables in the next, all the regular files last (perhaps also with a column for each extension).
So the first (and main) "challenge" is to print multiple columns of different lengths.
Do you have any suggestions what commands I should be using to write that script? Should I switch to find? Or should I just write the script all in Perl?
I want to be able to optionally sort the columns by size too ;-) I'm not necessarily looking for a script to do the above, but perhaps some advice on ways to approach writing such a script.
#!/bin/bash
width=20
awk -F':' '
/directory/{
d[i++]=$1
next
}
/executable/{
e[j++]=$1
next
}
{
f[k++]=$1
}
END{
a[1]=i;a[2]=j;a[3]=k
asort(a)
printf("%-*.*s | \t%-*.*s | \t%-*.*s\n", w,w,"Directories", w,w,"Executables", w,w,"Files")
print "------------------------------------------------------------------------"
for (i=0;i<a[3];i++)
printf("%-*.*s |\t%-*.*s |\t%-*.*s\n", w,w,d[i], w,w,e[i], w,w,f[i])
}' w=$width < <(find . -exec file {} +)
Sample output HERE
This can be further improved upon by calculating what the longest entry is per-column and using that as the width. I'll leave that as an exercise to the reader