My question revolves around a process I'm using to update a status file. I have a process running which does a simple
sed -i "s/info/newinfo/" file.txt
But this process can be called multiple times.
My question is, if two processes are running a sed command to modify the file at the same time, would that cause a problem?
I tried to test this by running 2 at commands at the same time doing two different sed modifications. They seem to work fine but I don't know if they were actually simultaneously or not. Maybe the command is so fast that it won't have a problem with read and write access from two different processes.
Ok let show, with a not so big file:
cd /tmp
seq 1000000 2000000 > mediumfile.txt
ls -hl mediumfile.txt
-rw-r--r-- 1 user user 7.7M Sep 26 16:53 host file.txt
wc mediumfile.txt
1000001 1000001 8000008 host file.txt
Ok, there is 1000k lines in my 7.7Mb file.
If I drop 2 x 1001 lines simultaneously by two separated (stream) process (from 1801000 to 1802000 and from 1803000 to 1804000).
sed '/1803000/,/1804000/d' -i mediumfile.txt & \
sed '/1801000/,/1802000/d' -i mediumfile.txt ;wait
[1] 30727
[1]+ Done sed '/1803000/,/1804000/d' -i mediumfile.txt
wc -l mediumfile.txt
999000 host file.txt
There are 1k line too much!
grep '180[13]400' mediumfile.txt
1803400
So it is.
Related
I need to run one input file after another (same folder) and have previously used something along the lines of:
sed -i -e 's/file1/file1/g'
sh run file 1
sed -i -e 's/file1/file2/g'
sh run file 2
However, I recently tried to use this and it wouldn't work. The run file works so I assume I'm just using the sed -i -e command incorrectly?
I would like to execute this make command to first replace the first line of all csv files inside the directory and then replace the # for commas through the other lines.
The second command is working fine and does what it is supposed to do, but the first one only replaces the line on the first file.
Could anyone give me a help on that?
csv:
$(DOCKER_RUN) npm run csv-generator
make format-csv
format-csv:
#sed -i '' '1 s/^.*$$/"bar","repository"/g' $(CURDIR)/foo/npm/*.csv
#sed -i '' 's/\(.*\)#/\1","/g' $(CURDIR)/foo/npm/*.csv
The reason that the first sed command "fails" is that sed doesn't reset the line counter between input files (on your system, and neither on my Mac OS X machine, see comments):
$ cat test1
a
b
g
$ cat test2
aa
bb
cc
$ sed -n '=' test1 test2 # the '=' sed command outputs line numbers
1
2
3
4
5
6
This is why the first sed command isn't doing what you want it to do, it only affects the first file's first line.
The solution is to loop over the files and call sed for each of them (untested in Makefile):
#for f in $(CURDIR)/foo/npm/*.csv; do \
sed -i '' '1 s/^.*$$/"bar","repository"/g' $f; \
done
Using find and xargs will also work, just make sure that find isn't picking up files further down in the folders.
EDIT: In light of the comments on this answer, I would recommend avoiding the use of sed -i on multiple files altogether, and convert both statements into for-loops (in this case, they may be collapsed into one loop with two statements):
#for f in $(CURDIR)/foo/npm/*.csv; do \
sed -i '' '1 s/^.*$$/"bar","repository"/g' $f; \
sed -i '' 's/\(.*\)#/\1","/g' $f; \
done
In my experience, using for-loops in Makefiles seems to be far more common compared to using find and xargs. This is probably due to incompatibility between find and xargs versions between Unices. It also makes the Makefile a lot easier to read if one uses explicit loops.
I managed to solve with:
#find $(CURDIR)/foo/npm -name "*.csv" -type f | xargs -L 1 sed -i '' '1 s/^.*$$/"bar"/g'
In a folder I have many files with several parameters in filenames, e.g (just with one parameter) file_a1.0.txt, file_a1.2.txt etc.
These are generated by a c++ code and I'd need to take the last one (in time) generated. I don't know a priori what will be the value of this parameter when the code is terminated. After that I need to copy the 2nd line of this last file.
To copy the 2nd line of the any file, I know that this sed command works:
sed -n 2p filename
I know also how to find the last generated file:
ls -rtl file_a*.txt | tail -1
Question:
how to combine these two operation? Certainly it is possible to pipe the 2nd operation to that sed operation but I dont know how to include filename from pipe as input to that sed command.
You can use this,
ls -rt1 file_a*.txt | tail -1 | xargs sed -n '2p'
(OR)
sed -n '2p' `ls -rt1 file_a*.txt | tail -1`
sed -n '2p' $(ls -rt1 file_a*.txt | tail -1)
Typically you can put a command in back ticks to put its output at a particular point in another command - so
sed -n 2p `ls -rt name*.txt | tail -1 `
Alternatively - and preferred, because it is easier to nest etc -
sed -n 2p $(ls -rt name*.txt | tail -1)
-r in ls is reverse order.
-r, --reverse
reverse order while sorting
But it is not good idea when used it with tail -1.
With below change (head -1 without r option in ls), performance will be better, that you needn't wait to list all files then pipe to tail command
sed -n 2p $(ls -t1 name*.txt | head -1 )
I was looking for a similar solution: taking the file names from a pipe of grep results to feed to sed. I've copied my answer here for the search & replace, but perhaps this example can help as it calls sed for each of the names found in the pipe:
this command to simply find all the files:
grep -i -l -r foo ./*
this one to exclude this_shell.sh (in case you put the command in a script called this_shell.sh), tee the output to the console to see what happened, and then use sed on each file name found to replace the text foo with bar:
grep -i -l -r --exclude "this_shell.sh" foo ./* | tee /dev/fd/2 | while read -r x; do sed -b -i 's/foo/bar/gi' "$x"; done
I chose this method, as I didn't like having all the timestamps changed for files not modified. Feeding the grep result allows only the files with target text to be looked at (thus likely may improve performance / speed as well)
be sure to backup your files & test before using. May not work in some environments for files with embedded spaces. (?)
fwiw - I had some problems using the tail method, it seems that the entire dataset was generated before calling tail on just the last item.
I want to split a large text database (~10 million lines). I can use a command like
$ sed -i -e '4 s/(dB)//' -e '4 s/Best\ unit/Best_Unit/' -e '1,3 d' '/cygdrive/c/ Radio Mobile/Output/TRC_TestProcess/trc_longlands.txt'
$ split -l 1000000 /cygdrive/P/2012/Job_044_DM_Radio_Propogation/Working/FinalPropogation/TRC_Longlands/trc_longlands.txt 1
The first line is to clean the databse and the next is to split it -
but then the output files do not have the field names. How can I incorporate the field names into each dataset and pipe a list which has the original file, new file name and line numbers (from original file) in it. This is so that it can be used in the arcgis model to re-join the final simplified polygon datasets.
ALTERNATIVELY AND MORE USEFULLY -as this needs to go into a arcgis model, a python based solution is best. More details are in https://gis.stackexchange.com/questions/21420/large-point-to-polygon-by-buffer-join-buffer-dissolve-issues#comment29062_21420 and Remove specific lines from a large text file in python
SO GOING WITH A CYGWIN based Python solution as per answer by icyrock.com
we have process_text.sh
cd /cygdrive/P/2012/Job_044_DM_Radio_Propogation/Working/FinalPropogation/TRC_Longlands
mkdir processing
cp trc_longlands.txt processing/trc_longlands.txt
cd txt_processing
sed -i -e '4 s/(dB)//' -e '4 s/Best\ unit/Best_Unit/' -e '1,3 d' 'trc_longlands.txt'
split -l 1000000 trc_longlands.txt trc_longlands_
cat > a
h
1
2
3
4
5
6
7
8
9
^D
split -l 3
split -l 3 a 1
mv 1aa 21aa
for i in 1*; do head -n1 21aa|cat - $i > 2$i; done
for i in 21*; do echo ---- $i; cat $i; done
how can "TRC_Longlands" and the path be replaced with the input filename -in python we have %path%/%name for this.
in the last line is "do echo" necessary?
and this is called by python using
import os
os.system("process_text.bat")
where process_text.bat is basically
bash process_text.sh
I get the following error when run from dos...
Microsoft Windows [Version 6.1.7601] Copyright (c) 2009 Microsoft
Corporation. All rights reserved.
C:\Users\georgec>bash
P:\2012\Job_044_DM_Radio_Propogation\Working\FinalPropogat
ion\TRC_Longlands\process_text.sh 'bash' is not recognized as an
internal or external command, operable program or batch file.
also when I run the bash command from cygwin -I get
georgec#ATGIS25
/cygdrive/P/2012/Job_044_DM_Radio_Propogation/Working/FinalPropogation/TRC_Longlands
$ bash process_text.sh : No such file or directory:
/cygdrive/P/2012/Job_044_DM_Radio_Propogation/Working/FinalPropogation/TRC_Longlands
cp: cannot create regular file `processing/trc_longlands.txt\r': No
such file or directory : No such file or directory: txt_processing :
No such file or directoryds.txt
but the files are created in the root directory.
why is there a "." after the directory name? how can they be given a .txt extension?
If you want to just prepend the first line of the original file to all but the first of the splits, you can do something like:
$ cat > a
h
1
2
3
4
5
6
7
^D
$ split -l 3
$ split -l 3 a 1
$ ls
1aa 1ab 1ac a
$ mv 1aa 21aa
$ for i in 1*; do head -n1 21aa|cat - $i > 2$i; done
$ for i in 21*; do echo ---- $i; cat $i; done
---- 21aa
h
1
2
---- 21ab
h
3
4
5
---- 21ac
h
6
7
Obviously, the first file will have one line less then the middle parts and the last part might be shorter, too, but if that's not a problem, this should work just fine. Of course, if your header has more lines, just change head -n1 to head -nX, X being the number of header lines.
Hope this helps.
Does anybody know if there is an rsync option, so that directories that are being traversed do not show on stdout.
I'm syncing music libraries, and the massive amount of directories make it very hard to see which file changes are actually happening.
I'v already tried -v and -i, but both also show directories.
If you're using --delete in your rsync command, the problem with calling grep -E -v '/$' is that it will omit the information lines like:
deleting folder1/
deleting folder2/
deleting folder3/folder4/
If you're making a backup and the remote folder has been completely wiped out for X reason, it will also wipe out your local folder because you don't see the deleting lines.
To omit the already existing folder but keep the deleting lines at the same time, you can use this expression :
rsync -av --delete remote_folder local_folder | grep -E '^deleting|[^/]$'
I'd be tempted to filter using by piping through grep -E -v '/$' which uses an end of line anchor to remove lines which finish with a slash (a directory).
Here's the demo terminal session where I checked it...
cefn#cefn-natty-dell:~$ mkdir rsynctest
cefn#cefn-natty-dell:~$ cd rsynctest/
cefn#cefn-natty-dell:~/rsynctest$ mkdir 1
cefn#cefn-natty-dell:~/rsynctest$ mkdir 2
cefn#cefn-natty-dell:~/rsynctest$ mkdir -p 1/first 1/second
cefn#cefn-natty-dell:~/rsynctest$ touch 1/first/file1
cefn#cefn-natty-dell:~/rsynctest$ touch 1/first/file2
cefn#cefn-natty-dell:~/rsynctest$ touch 1/second/file3
cefn#cefn-natty-dell:~/rsynctest$ touch 1/second/file4
cefn#cefn-natty-dell:~/rsynctest$ rsync -r -v 1/ 2
sending incremental file list
first/
first/file1
first/file2
second/
second/file3
second/file4
sent 294 bytes received 96 bytes 780.00 bytes/sec
total size is 0 speedup is 0.00
cefn#cefn-natty-dell:~/rsynctest$ rsync -r -v 1/ 2 | grep -E -v '/$'
sending incremental file list
first/file1
first/file2
second/file3
second/file4
sent 294 bytes received 96 bytes 780.00 bytes/sec
total size is 0 speedup is 0.00