how to parse out file name and import the data into csv files using shellscript?

how to parse out file name and import the data into csv files using shellscript? - matlab

I have a Matlab program output some data into filenames for example a filename look like
Temperature_10_Volumn_5.dat
The file is empty inside, but I need to parse out 10 and 5. I have 1000 files like this. What would be the easiest shellscript to parse out the numbers and write them to a csv file such that Matlab can read the csv file and plot the graph? Thanks!

Here is the script that will print all your values in a single file with values separated by ";"
Usage is :
parse_file.sh /home/user/your_dir_to_files output.csv
#!/bin/bash
#title :parse_file.sh
#description :parse file name into a single file
#author :Bertrand Martel
#date :13/08/2015
file_list=`ls $1`
IFS=$'\n' #line delimiter
#empty your output file
cp /dev/null "$2"
for i in $file_list; do
#get temperature between _ and _
temperature=`echo $i | awk -v FS="([_]|[_])" '{print $2}'`
#remove everything before Volumn
second_part=`echo $i | sed 's/.*Volumn//'`
#get volume between _ and .
volume=`echo $second_part | awk -v FS="([_]|[.])" '{print $2}'`
new_line="$temperature;$volume"
#append to file
echo $new_line >> "$2"
done
cat "$2"
I created a gist with the file https://gist.github.com/bertrandmartel/fd68c0373af35eaba934
I assume you only have the so called files in you directory output

Related

rename batch files in folder using a textfile

I have a folder of files that start with specific strings and would like to replace part of their strings using the corresponding column from textfile
Folder with files
ABC_S1_002.txt
ABC_S1_003.html
ABC_S1_007.png
NMC_D1_002.png
NMC_D2_003.html
And I have a text file that has the strings to be replaced as:
ABC ABC_newfiles
NMC NMC_extra
So the folder after renaming will be
ABC_newfiles_S1_002.txt
ABC_newfiles_S1_003.html
ABC_newfiles_S1_007.png
NMC_extra_D1_002.png
NMC_extra_D2_003.html
I tried file by file using mv
for f in ABC*; do mv "$f" "${f/ABC/ABC_newfiles}"; done
How can I read in the textfile that has the old strings in first column and replace that with new strings from second column? I tried
IFS=$'\n'; for i in $(cat file_rename);do oldName=$(echo $i | cut -d $'\t' -f1); newName=$(echo $i | cut -d $'\t' -f2); for f in oldName*; do mv "$f" "${f/oldName/newName}"; done ; done
Did not work though.

This might work for you (GNU parallel and rename):
parallel --colsep ' ' rename -n 's/{1}/{2}/' {1}* :::: textFile
This will list out the rename commands for each line in textFile.
Once the output has been checked, remove the -n option and run for real.
For a sed solution, try:
sed -E 's#(.*) (.*)#ls \1*| sed "h;s/\1/\2/;H;g;s/\\n/ /;s/^/echo mv /e"#e' testFile
Again, this will echo the mv commands out, once checked, remove echo and run for real.

Review the result of
sed -r 's#([^ ]*) (.*)#for f in \1*; do mv "$f" "${f/\1/\2}"; done#' textfile
When that looks well, you can copy paste the result or wrap it in source:
source <(sed -r 's#([^ ]*) (.*)#for f in \1*; do mv "$f" "${f/\1/\2}"; done#' textfile)

How to remove some text in long filename from bunch of files in directory

Can't boot my Windows PC today and I am on 2nd OS Linux Mint. With my limited knowledge on Linux and shell scripts, I really don't have an idea how to do this.
I have a bunch of files in a directory generated from my system, need to remove the last 12 characters from the left of ".txt"
Sample filenames:
filename1--2c4wRK77Wk.txt
filename2-2ZUX3j6WLiQ.txt
filename3-8MJT42wEGqQ.txt
filename4-sQ5Q1-l3ozU.txt
filename5--Way7CDEyAI.txt
Desired result:
filename1.txt
filename2.txt
filename3.txt
filename4.txt
filename5.txt
Any help would be greatly appreciated.

Here is a programmatic way of doing this while still trying to account for pesky edge cases:
#!/bin/sh
set -e
find . -name "filename*" > /tmp/filenames.list
while read -r FILENAME; do
NEW_FILENAME="$(
echo "$FILENAME" | \
awk -F '.' '{$NF=""; gsub(/ /, "", $0); print}' | \
awk -F '/' '{print $NF}' | \
awk -F '-' '{print $1}'
)"
EXTENSION="$(echo "$FILENAME" | awk -F '.' '{print $NF}')"
if [[ "$EXTENSION" == "backup" ]]; then
continue
else
cp "$FILENAME" "${FILENAME}.backup"
fi
if [[ -z "$EXTENSION" ]]; then
mv "$FILENAME" "$NEW_FILENAME"
else
mv "$FILENAME" "${NEW_FILENAME}.${EXTENSION}"
fi
done < /tmp/filenames.list
Create a List of Files to Edit
First up create a list of files that you would like to edit (assuming that they all start with filename) and under the current working directory (.):
find . -name "filename*" > /tmp/filenames.list
If they don't start with filename fret not you could always use a find command like:
find . -type f > /tmp/filenames.list
Iterate over a list of files
To accomplish this we use a while read loop:
while read -r LINE; do
# perform action
done < file
If you had the ability to use bash you could always use a named pipe redirect:
while read -r LINE; do
# perform action
done < <(
find . -type f
)
Create a rename variable
Next, we create a variable NEW_FILENAME and using awk we strip off the file extension and any trailing spaces using:
awk -F '.' '{$NF=""; gsub(/ /, "", $0); print}'
We could just use the following though if you know for certain that there aren't multiple periods in the filename:
awk -F '.' '{print $1}'
The leading ./ is stripped off via
awk -F '/' '{print $NF}'
although this could have been easily done via basename
With the following command, we strip everything after the first -:
awk -F '-' '{print $1}'
Creating backups
Feel free to remove this if you deem unnecessary:
if [[ "$EXTENSION" == "backup" ]]; then
continue
else
cp "$FILENAME" "${FILENAME}.backup"
fi
One thing that we definitely don't want is to make backups of backups. The above logic accounts for this.
Renaming the files
One thing that we don't want to do is append a period to a filename that doesn't have an extension. This accounts for that.
if [[ -z "$EXTENSION" ]]; then
mv "$FILENAME" "$NEW_FILENAME"
else
mv "$FILENAME" "${NEW_FILENAME}.${EXTENSION}"
fi
Other things of note
Odds are that your Linux Mint installation has a bash shell so you could simplify some of these commands. For instance, you could use variable substitution: echo "$FILENAME" | awk -F '.' '{print $NF}' would become "${FILENAME##.*}"
[[ is not defined in POSIX sh so you will likely just need to replace [[ with [, but review this document first:
https://mywiki.wooledge.org/BashFAQ/031

From the pattern of filenames it looks like that the first token can be picked before "-" from filenames. Use following command to rename these files after changing directory to where files are located -
for srcFile in `ls -1`; do fileN=`echo $srcFile | cut -d"-" -f1`; targetFile="$fileN.txt"; mv $srcFile $targetFile; done
If above observation is wrong, following command can be used to remove exactly 12 characters before .txt (4 chars) -
for srcFile in `ls -1`; do fileN=`echo $srcFile | rev | cut -c17- | rev`; targetFile="$fileN.txt"; mv $srcFile $targetFile; done
In ls -1, a pattern can be added to filter files from current directory if that is required.

Remove everything in a line before comma

I have multiple files with lines like:
foo, 123456
bar, 654321
baz, 098765
I would like to remove everything on each line before (and including) the comma.
The output would be:
123456
654321
098765
I attempted to use the following after seeing something similar on another question, but the user didn't leave an explanation, so I'm not sure how the wildcard would be handled:
find . -name "*.csv" -type f | xargs sed -i -e '/*,/d'
Thank you for any help you can offer.

METHOD 1:
If it's always the 2nd column you want, you can do this with awk -- this command is actually splitting the rows on the whitespace rather than the comma, so it gets your second column -- the numbers, but without the leading space:
awk '{print $2}' < whatever.csv
METHOD 2:
Or to get everything after the comma (including the space):
sed -e 's/^.*,//g' < whatever.csv
METHOD 3:
If you want to find all of the .csv files and get the output of all of them together, you can do:
sed -e 's/^.*,//g' `find . -name '*.csv' -print`
METHOD 4:
Or the same way you were starting to -- with find and xargs:
find . -name '*.csv' -type f -print | xargs sed -e 's/^.*,//'
METHOD 5:
Making all of the .csv files into .txt files, processed in the way described above, you can make a brief shell script. Like this:
Create a script "bla.sh":
#!/bin/sh
for infile in `find . -name '*.csv' -print` ; do
outfile=`echo $infile | sed -e 's/.csv/.txt/'`
echo "$infile --> $outfile"
sed -e 's/^.*,//g' < $infile > $outfile
done
Make it executable by typing this:
chmod 755 bla.sh
Then run it:
./bla.sh
This will create a .txt output file with everything after the comma for each .csv input file.
ALTERNATE METHOD 5:
Or if you need them to be named .csv, the script could be updated like this -- this just makes an output file named "file-new.csv" for each input file named "file.csv":
#!/bin/sh
for infile in `find . -name '*.csv' -print` ; do
outfile=`echo $infile | sed -e 's/.csv/-new.csv/'`
echo "$infile --> $outfile"
sed -e 's/^.*,//g' < $infile > $outfile
done

Something like this should work for a single file. Let's say the
input is 'yourfile' and you want the output to go to 'outfile'.
sed 's/^.*,//' < yourfile > outfile
The syntax to do a search-and-replace is s/input_pattern/replacement/
The ^ anchors the input pattern to the beginning of the line.
A dot . matches any single character; .* matches a string of zero or more of any character.
The , matches the comma.
The replacement pattern is empty, so whatever matched the input_pattern
will be removed.

Dynamically building a exlude list for both rsync & egrep format

I wonder if anyone out there can assist me in trying to solve a issue with me.
I have written a set of shell scripts with the purpose of auditing remote file systems based on a GOLD build on a audit server.
As part of this, I do the following:
1) Use rsync to work out any new files or directories, any modified or removed files
2) Use find ${source_filesystem} -ls on both local & remote to work out permissions differences
Now as part of this there are certain files or directories that I am excluding, i.e. logs, trace files etc.
So in order to achieve this I use 2 methods:
1) RSYNC - I have an exclude-list that is added using --exclude-from flag
2) find -ls - I use a egrep -v statement to exclude the same as the rsync exclude-list:
e.g. find -L ${source_filesystem} -ls | egrep -v "$SEXCLUDE_supt"
So my issue is that I have to maintain 2 separate lists and this is a bit of a admin nightmare.
I am looking for some assistance or some advice on if it is possible to dynamically build a list of exlusions that can be used for both the rsync or the find -ls?
Here is the format of what the exclude lists look like::
RSYNC:
*.log
*.out
*.csv
logs
shared
tracing
jdk*
8.6_Code
rpsupport
dbarchive
inarchive
comms
PR116PICL
**/lost+found*/
dlxwhsr*
regression
tmp
working
investigation
Investigation
dcsserver_weblogic_*.ear
dcswebrdtEAR_weblogic_*.ear
FIND:
SEXCLUDE_supt="\.log|\.out|\.csv|logs|shared|PR116PICL|tracing|lost\+found|jdk|8\.6\_Code|rpsupport|dbarchive|inarchive|comms|dlxwhsr|regression|tmp|working|investigation|Investigation|dcsserver_weblogic_|dcswebrdtEAR_weblogic_"

You don't need to create a second list for your find command. grep can handle a list of patterns using the -f flag. From the manual:
-f FILE, --file=FILE
Obtain patterns from FILE, one per line. The empty file contains zero
patterns, and therefore matches nothing. (-f is specified by POSIX.)
Here's what I'd do:
find -L ${source_filesystem} -ls | grep -Evf your_rsync_exclude_file_here
This should also work for filenames containing newlines and spaces. Please let me know how it goes.

In the end the grep -Evf was a bit of a nightmare as rsync didnt support regex, it uses regex but not the same.
So I then pursued my other idea of dynamically building the exclude list for egrep by parsing the rsync exclude-list and building variable on the fly to pass into egrep.
This the method I used:
#!/bin/ksh
# Create Signature of current build
AFS=$1
#Create Signature File
crSig()
{
find -L ${SRC} -ls | egrep -v **"$SEXCLUDE"** | awk '{fws = ""; for (i = 11; i <= NF; i++) fws = fws $i " "; print $3, $6, fws}' | sort >${BASE}/${SIFI}.${AFS}
}
#Setup SRC, TRG & SCROOT
LoadAuditReqs()
{
export SRC=`grep ${AFS} ${CONF}/fileSystem.properties | awk {'print $2'}`
export TRG=`grep ${AFS} ${CONF}/fileSystem.properties | awk {'print $3'}`
export SCROOT=`grep ${AFS} ${CONF}/fileSystem.properties | awk {'print $4'}`
**export BEXCLUDE=$(sed -e 's/[*/]//g' -e 's/\([._+-]\)/\\\1/g' ${CONF}/exclude-list.${AFS} | tr "\n" "|")**
**export SEXCLUDE=$(echo ${BEXCLUDE} | sed 's/\(.*\)|/\1/')**
}
#Load Properties File
LoadProperties()
{
. /users/rpapp/rpmonit/audit_tool/conf/environment.properties
}
#Functions
LoadProperties
LoadAuditReqs
crSig
So with these new variables:
**export BEXCLUDE=$(sed -e 's/[*/]//g' -e 's/\([._+-]\)/\\\1/g' ${CONF}/exclude-list.${AFS} | tr "\n" "|")**
**export SEXCLUDE=$(echo ${BEXCLUDE} | sed 's/\(.*\)|/\1/')**
I use them to remove "*" and "/", then match my special characters and prepend with "\" to escape them.
Then it using "tr" replace a newline with "|" and then rerunning that output to remove the trailing "|" to make the variable $SEXCLUDE to use for egrep that is used in the crSig function.
What do you think?

Unix - Split to N files using regexp to name destination file

How do I split a file to N files using as a filename the first 2 chars on the line.
Ex input file:
AA23409234TEXT
BA23201202Other Text
AA23509234YADA
BA23202202More Text.
C1000000000000000000
Should generate 3 files:
AA.txt
AA23409234TEXT
AA23509234YADA
BA.txt
BA23201202Other Text
BA23202202More Text.
C1.txt
C1000000000000000000
I'm thinking of using a sed script similar to this
/^(..)/w \1
But what that really does is create a file named '\1' instead of the capture group.
Any ideas?

$ awk '{fname=substr($0, 0, 2); print >>fname}' input.txt
Or
$ while read line; do echo "$line" >>"${line:0:2}"; done <input.txt

The first thing you need to do is determine all of your file names:
filenames=$(sed 's/\(..\).*/\1/' listOfStrings.txt | sort | uniq)
Then, loop through those filenames
for filename in $filenames
do
sed -n '/^$filename/ p' listOfStrings.txt > $filename.txt
done
I have not tested this, but I think it should work.

This might work for you:
sed 's/\(..\).*/echo "&" >>\1.txt/' file | sh
or if you have GNU sed:
sed 's/\(..\).*/echo "&" >>\1.txt/e' file

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

how to parse out file name and import the data into csv files using shellscript? - matlab

Related

rename batch files in folder using a textfile

How to remove some text in long filename from bunch of files in directory

Remove everything in a line before comma

Dynamically building a exlude list for both rsync & egrep format

Unix - Split to N files using regexp to name destination file

Categories

Resources