No output when running spark NetworkWordCount example - scala

I am a Beginner of spark, and I am trying to use docker to run spark example NetworkWordCount. But no output when I run the example:
start a new terminal by docker exec -it container_id bash,
root#sandbox:/usr/local/spark# nc -lk 9999
a ba c d e f g
a ba c d e f g
a ba c d e f g
a ba c d e f g
a ba c d e f g
a ba c d e f g
a ba c d e f g
a ba c d e f g
a ba c d e f g
a ba c d e f g
a ba c d e f g
a ba c d e f g
a ba c d e f g
a ba c d e f g
a ba c d e f g
a ba c d e f g
a ba c d e f g
a ba c d e f g
a ba c d e f g
then start another terminal:
root#sandbox:/usr/local/spark# bin/run-example streaming.NetworkWordCount localhost 9999
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/05/12 02:55:57 INFO StreamingExamples: Setting log level to [WARN] for streaming example. To override add a custom log4j.properties to the classpath.
16/05/12 02:55:58 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
...
Anyone can help me?

Executing using below command worked for me.
[root#quickstart spark]# ./bin/run-example streaming.NetworkWordCount localhost 9999 --master "local[2]"

Related

Spaces in nssm.exe command output in powershell

I'm using nssm.exe in my scripts to manage the windows services. But in PowerShell, the command output is coming with spaces after every alphabet.
PS> $nssm = (Get-Command D:\nssm.exe)
PS> & $nssm
nssm.exe : N S S M : T h e n o n - s u c k i n g s e r v i c e m a n a g e r
At line:1 char:1
+ & $nssm
+ ~~~~~~~
+ CategoryInfo : NotSpecified: (N S S M : T h... m a n a g e r :String) [], RemoteException
+ FullyQualifiedErrorId : NativeCommandError
V e r s i o n 2 . 2 4 3 2 - b i t , 2 0 1 4 - 0 8 - 3 1
U s a g e : n s s m < o p t i o n > [ < a r g s > . . . ]
T o s h o w s e r v i c e i n s t a l l a t i o n G U I :
n s s m i n s t a l l [ < s e r v i c e n a m e > ]
T o i n s t a l l a s e r v i c e w i t h o u t c o n f i r m a t i o n :
n s s m i n s t a l l < s e r v i c e n a m e > < a p p > [ < a r g s > . . . ]
T o s h o w s e r v i c e e d i t i n g G U I :
n s s m e d i t < s e r v i c e n a m e >
How to get the output without such wide-format spaces among alphabets?

Additional space added while reading text from Notepad using powershell

My requirement is to find the location of a particular string in a line from notepad file but while reading it in powershell additional space get added that's why not able to find the location of a particular string. How I can find the location is this case??
I am using this code for achieving this
$ParamsPathForData = ($dir + "\TimeStats\TimeStats_1slot\29_12_2015_07TimeStats1.txt")
$data = Get-Content $ParamsPathForData
write-host $data.count total lines read from file
foreach ($line in $data)
{
$l =$line.IndexOf("12/29/2015")
write-host $l
}
I am reading this line from notepad ->
TimeStats 29 12/29/2015 7:13:42 AM +00:00 Debug PREPROCESS: SlotNo:
325-00313, Ip Address: 10.2.200.15, Duplicate Message: False,
Player-Card-No: , MessageId: 883250003130047966, MessageName:
GameIdInfo, Thread Init Delay: 14, Time To Parse: 155, Time To Exec
Main Workflow: 424, Time To Construct & send Response: 22, Total
Response Time: 615
But while exceuting it in powershell i am getting this with additinal spaces ->
T i m e S t a t s 2 9 1 2 / 2 9 / 2 0 1 5 7 : 1 3 : 4 2 A M
+ 0 0 : 0 0 D e b u g P R E P R O C E S S : S l o t N o : 3 2 5 - 0 0 3 1 3 , I p A d d r e s s : 1 0 . 2 . 2 0 0 . 1 5 , D
u p l i c a t e M e s s a g e : F a l s e , P l a y e r
- C a r d - N o : , M e s s a g e I d : 8 8 3 2 5 0 0 0 3 1 3 0 0 4 7 9 6 6 , M e s s a g e N a m e : G a m e I d I n f o , T h
r e a d I n i t D e l a y : 1 4 , T i m e T o P a r s e :
1 5 5 , T i m e T o E x e c M a i n W o r k f l o w : 4 2
4 , T i m e T o C o n s t r u c t & s e n d R e s p o n s
e : 2 2 , T o t a l R e s p o n s e T i m e : 6 1 5
Anybody please help me???
Change the the encoding to Unicode...
$data = Get-Content $ParamsPathForData -Encoding Unicode

How to double the columns in a data frame in perl

I have a big data frame that looks like this:
name1 A A G
name2 C C T
name3 A G G
name4 H G G
name5 C - T
name6 C C C
name7 A G G
name8 G G A
I expect the data frame changed to:
name1 A A A A G G
name2 C C C C T T
name3 A A G G G G
name4 H H G G G G
name5 C C - - T T
name6 C C C C C C
name7 A A G G G G
name8 G G G G A A
I tried to work with R to do this but the memory limit not allow me to do it. Please help me with a perl solution. I don't know how to write a perl script. Thanks.
perl -lane'
BEGIN { $, ="\t" }
print shift(#F), map{ ($_)x2 } #F
' file
output
name1 A A A A G G
name2 C C C C T T
name3 A A G G G G
name4 H H G G G G
name5 C C - - T T
name6 C C C C C C
name7 A A G G G G
name8 G G G G A A
Using a perl one-liner
perl -lane 'print join "\t", shift(#F), map {($_) x 2} #F' data.txt

complex looping using Matlab

Let there be five matrices given as:
A= [A1 A1 A1 A1 A1; A2 A2 A2 A2 A2; A3 A3 A3 A3 A3]
B= [B1 B1 B1 B1 B1; B2 B2 B2 B2 B2;B3 B3 B3 B3 B3]
C=[ C1 C1 C1 C1 C1; C2 C2 C2 C2 C2; C3 C3 C3 C3 C3]
D= [D1 D1 D1 D1 D1 ; D2 D2 D2 D2 D2; D3 D3 D3 D3 D3]
E=[ E1 E1 E1 E1 E1; E2 E2 E2 E2 E2; E3 E3 E3 E3 E3]
I want to make a program such that ouput consists of taking each row of each given matrix and forming a new matrix. how to use looping in such cases when length of matrices increases and number of given matrices also increases. This problem seemed to me a complex one. Because I want to generalize by using loop and output for any number of matrices say 20 and having number of columns also increased to say 25, then how to get these P1 to P20 outputs. Can anyone help me regarding this complex trouble using Matlab
P1=[ A1 A1 A1 A1 A1; B1 B1 B1 B1 B1; C1 C1 C1 C1 C1 C1; D1 D1 D1 D1 D1; E1 E1 E1 E1 E1]
P2=[ A2 A2 A2 A2 A2; B2 B2 B2 B2 B2; C2 C2 C2 C2 C2 C2; D2 D2 D2 D2 D2; E2 E2 E2 E2 E2]
and similarly other matrices is obtained .
Note: That the given 5 matrices are generated with help of loop. So first I would be getting values as :
A= A1
B= B1
C=C1
D=D1
E=E1
A= A1 A1
B= B1 B1
C=C1 C1
D=D1 D1
E=E1 E1 .... AND SO ON
Get a loop and put all the matrix together to form a 3D tensor. Or just put the matrices in the 3D tensor when they are created.
M(:,:,1) = A; M(:,:,2) = B; etc
then
squeeze(M(1,:,:))' is the P1, squeeze(M(2,:,:))' is the P2
Example:
M(:,:,1) =
1 2
3 4
M(:,:,2) =
5 6
7 8
>> squeeze(M(1,:,:))'
ans =
1 2
5 6

Read Tab delimited file and count the occurrences and delete row

I am fairly new to programming and trying to resolve this problem. I have the file like this.
CHROM POS REF ALT 10_sample.bam 11_sample.bam 12_sample.bam 13_sample.bam 14_sample.bam 15_sample.bam 16_sample.bam
tg93 77 T C T T T T T
tg93 79 C - C C C - -
tg93 79 C G C C C C G C
tg93 80 G A G G G G A A G
tg93 81 A C A A A A C C C
tg93 86 C A C C A A A A C
tg93 105 A G A A A A A G A
tg93 108 A G A A A A G A A
tg93 114 T C T T T T T C T
tg93 131 A C A A A A A A A
tg93 136 G C C G C C G G G
tg93 150 CTCTC - CTCTC - CTCTC CTCTC
In this file, in the heading
CHROM - name
POS - position
REF - reference
ALT - alternate
10 - 16_sample.bam - samplesd
I
Now i wanted to see how many times the letter in REF and ALT column occured. If either of them is repeated less than two times, i need to delete that row.
For example
In the first row, i have 'T' in REF and 'C' in ALT . I see in 7 samples, there are 5 T's and 2 blanks and no C. So i need to delete this row.
In Second row, REF is 'C' and Alt is '-'. Now in seven samples we have 3 C's, 2 '-'s and 2 blanks. So we keep this row as C and - have repeated more than 2 times.
Always we ignore the blanks while counting
The final file after filtering is
#CHROM POS REF ALT 10_sample.bam 11_sample.bam 12_sample.bam 13_sample.bam 14_sample.bam 15_sample.bam 16_sample.bam
tg93 79 C - C C C - -
tg93 80 G A G G G G A A G
tg93 81 A C A A A A C C C
tg93 86 C A C C A A A A C
tg93 136 G C C G C C G G G
I am able to read the columns in to arrays and display them in the code but i am not sure how to start the loops to read the base and count their occurrences and remain the column. Can anyone tell me how i should be proceeding with this? Or it will be helpful if you have any example code i can modify up on.
#!/usr/bin/env perl
use strict;
use warnings;
print scalar(<>); # Read and output the header.
while (<>) { # Read a line.
chomp; # Remove the newline from the line.
my ($chrom, $pos, $ref, $alt, #samples) =
split /\t/; # Parse the remainder of the line.
my %counts; # Count the occurrences of sample values.
++$counts{$_} for #samples; # e.g. Might end up with $counts{"G"} = 3.
print "$_\n" # Print line if we want to keep it.
if ($counts{$ref} || 0) >= 2 # ("|| 0" avoids a spurious warning.)
&& ($counts{$alt} || 0) >= 2;
}
Output:
CHROM POS REF ALT 10_sample.bam 11_sample.bam 12_sample.bam 13_sample.bam 14_sample.bam 15_sample.bam 16_sample.bam
tg93 79 C - C C C - -
tg93 80 G A G G G G A A G
tg93 81 A C A A A A C C C
tg93 86 C A C C A A A A C
tg93 136 G C C G C C G G G
You included 108 in your desired output, but it only has one instance of ALT in the seven samples.
Usage:
perl script.pl file.in >file.out
Or in-place:
perl -i script.pl file
Here's an approach that does not assume tab separation between fields
use IO::All;
my $chrom = "tg93";
my #lines = io('file.txt')->slurp;
foreach(#lines) {
%letters = ();
# use regex with backreferences to extract data - this method does not depend on tab separated fields
if(/$chrom\s+\d+\s+([A-Z-\s])\s{3}([A-Z-\s])\s{3}([A-Z-\s])\s{3}([A-Z-\s])\s{3}([A-Z-\s])\s{3}([A-Z-\s])\s{3}([A-Z-\s])\s{3}([A-Z-\s])\s{3}([A-Z-\s])/) {
# initialize hash counts
$letters{$1} = 0;
$letters{$2} = 0;
# loop through the samples and increment the counter when matches are found
foreach($3, $4, $5, $6, $7, $8, $9) {
if ($_ eq $1) {
++$letters{$1};
}
if ($_ eq $2) {
++$letters{$2};
}
}
# if the counts for both POS and REF are greater than or equal to 2, print the line
if($letters{$1} >= 2 && $letters{$2} >= 2) {
print $_;
}
}
}