Find lines from number of characters - find

Hello all i have a file with 71989 lines in notepad++, most of the lines have 11 commas (,) because i have 11 columns but when i load my file in SQL i get some errors because some lines have 9 records, so 9 commas.
Is there a way in notepad++ to find which lines have 9 commas instead of 11?
I can also use python.
Thank you in advance

Using Python you could count all occurrences of a comma on each line...
'''
contents of data.txt (your SQL file for example)...
1,2,3,4,5,6,7,8,9,10,11,
,,,,,,,,,,,
,,,,,,,,,
,,,,,,,,,,,
1,2,3,4,5,6,7,8,9,
Note: Nine commas on the third and fifth lines.
'''
with open('data.txt') as f:
lines = f.readlines()
for line_number, line in enumerate(lines, 1):
if line.count(',') == 9:
print('Check line number: ' + str(line_number))
Outputs:
Check line number: 3
Check line number: 5
Maybe that helps :o?

Ctrl+M
Find what: ^[^,]*(?:,[^,]*){9}$
CHECK Wrap around
CHECK Regular expression
Check Bookmark line
Find all
Explanation:
^ # beginning of line
[^,]* # 0 or more non comma
(?: # start non capture group
, # a comma
[^,]* # 0 or more non comma
){9} # end group, must appear 9 times
$ # end of line
Screenshot:

Related

Match start and end of word in Multiple line using Perl Regex

I have a multiple-line text, need to match the text starting word and ending word of multiple lines in perl command
Multipleline text
Start here
line 1
line 2
line 3
End
tired the below command to match the text, but it is only working single line, Need to match multiple line.
'(^Start.+End$)'
Those quotes shouldn't be there.
You want the s modifier to make . match any character including line feeds.
You want the m modifier if you want ^ and $ to match start and end of line (as opposed to start and end of string).
A common mistake is to repeatedly match against only one line instead of matching once against the entire text.
For example,
my $text = <<'.';
...
Start here
line 1
line 2
line 3
End
...
.
say $1 if $text =~ /(^Start.+End$)/sm;

remove a newline befor a specific character in a txt file perl

i have a problem i have a txt file that has several lines with a three line pattern that for some reason is unpastable so i have to describe it. first line starts looks like this ">#1M1U7:00204:00340" can have any number after the : but have a fixed number of characters. The next line look like this "_F_48_32.0416666667" and can have any number after the last underscore and can be of different legths. The last lien in the pattern is a DNA sequence. what i want is to join the two first lines together.
I want a script in perl that can fix this for me
Just chomp every first line of the three-line group:
perl -pe 'chomp if 1 == $. % 3' < input > output

SED Command to remove first digits and spaces of each line

I have a simple text file in below format.
1 12658003Y
2 34345345N
3 34653785Y
4 36452342N
5 86747488Y
6 34634543Y
so on
10 37456338Y
11 33535555Y
12 37456378Y
so on
100 23432434Y
As you can see there are two white spaces after first number.
I'm trying to write SED command to remove the digits before whitespaces. Is there any SED command to remove spaces and number before spaces?
Output file should look like below.
12658003Y
34345345N
34653785Y
36452342N
so on..
Please assist. I'm very new to shell scripting.
sed 's/[0-9]\+\s\+//' infile > outfile
Explanation:
s: we want to use substitution
/: mark start and end of the expression we want to match
[0-9]: match any digit
+: match the previous one or more time
\s: space
+: match the previous one or more time
/: mark start of what we want to change our matches to (which is nothing)
/: some special operators goes after this (we use no such)
infile: the file we want to change
>: pipe stdout to
outfile: where we want to store output
Your sed command would be,
sed 's/.* //g' file
This would remove the first numbers along with the space followed.
Remove leading digits, then following spaces:
sed 's/^[0-9]* *//' file
sed 's/^[0-9]*[ ]*//g' input.txt

How to use 'sed or gawk' to delete a text block until the third line previous the last one

Good day,
I was wondering how to delete a text block like this:
1
2
3
4
5
6
7
8
and delete from the second line until the third line previous the last one, to obtain:
1
2
6
7
8
Thanks in advance!!!
BTW This text block is just an example, the real text blocks I working on are huge and each one differs among them in the line numbers.
Getting the number of lines with wc and using awk to print the requested range:
$ awk 'NR<M || NR>N-M' M=3 N="$(wc -l file)" file
1
2
6
7
8
This allows you to easily change the range by just changing the value of M.
This might work for you (GNU sed):
sed '3,${:a;$!{N;s/\n/&/3;Ta;D}}' file
or i f you prefer:
sed '1,2b;:a;$!{N;s/\n/&/3;Ta;D}' file
These always print the first two lines, then build a running window of three lines.
Unless the end of file is reached the first line is popped off the window and deleted. At the end of file the remaining 3 lines are printed.
since you mentioned huge and also line numbers could be differ. I would suggest this awk one-liner:
awk 'NR<3{print;next}{delete a[NR-3];a[NR]=$0}END{for(x=NR-2;x<=NR;x++)print a[x]}' file
it processes the input file only once, without (pre) calculating total line numbers
it stores minimal data in memory, in all processing time, only 3 lines data were stored.
If you want to change the filtering criteria, for example, removing from line x to $-y, you just simply change the offset in the oneliner.
add a test:
kent$ seq 8|awk 'NR<3{print;next}{delete a[NR-3];a[NR]=$0}END{for(x=NR-2;x<=NR;x++)print a[x]}'
1
2
6
7
8
Using sed:
sed -n '
## Append second line, print first two lines and delete them.
N;
p;
s/^.*$//;
## Read next three lines removing leading newline character inserted
## by the "N" command.
N;
s/^\n//;
N;
:a;
N;
## I will keep three lines in buffer until last line when I will print
## them and exit.
$ { p; q };
## Not last line yet, so remove one line of buffer based in FIFO algorithm.
s/^[^\n]*\n//;
## Goto label "a".
ba
' infile
It yields:
1
2
6
7
8

SED: Operate on Last seven lines regardless of file length

I would like to operate on the last 7 lines of a file with sed regardless of the filelength.
According to a related question this type of range won't work: $-6,$ {..commands..}
What is the equivalent that will?
Pipe the output of tail -7 into sed.
tail -7 test.txt | sed -e "s/e/WWW/"
More info on Pipes here.
You could just switch from sed(1) to ed(1), the commands are about the same. In this case, the command is the same, except with no limitations on address range.
$ cat > fl7.ed
ed - $1 << \eof
1,7s/$/ (one of the first seven lines)/
$-6,$s/$/ (one of the last seven lines)/
w
q
eof
$ sh fl7.ed yourfile
perl -lne 'END{print join$\,#a,"-",#b}push#a,$_ if#a<6;push#b,$_;shift#b if#b>7'
In the END{} block you can do whatever is required; #a contains the first 6, #b the last 7 lines as requested.
This should work for you:
sed '1{N;N;N;N;N};N;$s/foo/bar/g;P;D' inputfile
Explanation:
1{N;N;N;N;N} - when the first line is read, load the pattern space with five more lines (total: 6 at this point)
N - append another line
$s/foo/bar/g - when the last line is read, perform some operation on the entire contents of pattern space (the last seven lines of the file). Operations can be more complex than shown here
P - print the test before the first newline in pattern space
D - delete the text just printed and loop to the beginning of the script (the "append another line" step - the first instruction is skipped since it only applies to the first line in the file)
This might work for you:
sed ':a;1,6{$!N;ba};${s/foo/bar/g;q};N;D' file
Explanation:
Create a loop label. :a
Gather lines 1 to 6 in the pattern space (PS). 1,6{$!N;ba}
If it's the last line, process the PS and quit, therefore printing out the last seven lines. ${s/foo/bar/g;q}
If it's not the last line, append the next line to the PS. N
Delete upto the first newline and begin a new cycle without reading a new line. D