Text file processing using spark scala - scala

I want to fetch particular portion from text file using spark scala
Is there any inbuilt function?
If we can do it by using regx then how to do that?
from below line data starts
/bin/rm: cannot unlink `/fabos/link_sbin/lscfg_test': Permission denied
Non-VF
======================
Date:
Mon Jul 8 08:48:40 CEST 2019
Time Zone:
Europe/Berlin
Version:
Kernel: 2.6.14.2
Fabric OS: v7.4.2a
Made on: Thu Jun 29 17:22:14 2017
Flash: Tue Oct 10 09:27:26 2017
BootProm: 1.0.11
supportshow groups enabled:
Unknown key pm:0
os enabled
exception enabled
port enabled
fabric enabled
services enabled
security enabled
network enabled
portlog enabled
system enabled
extend disabled
filter disabled
ficon disabled
iswitch enabled
asic_db enabled
fcip disabled (not applicable to this platform)
ag enabled
dce_hsl enabled
Begin start_port_log_cmd group
Mon Jul 8 08:48:44 CEST 2019
portlogdump:
portlogdump :
time task event port cmd args
-------------------------------------------------
Mon Jul 8 03:27:51 2019
03:27:51.199 FCPH seq 13 28 00300000,00000000,00000591,00020182,00000000
03:27:51.199 PORT Rx 11 0 c0fffffd,00fffffd,0ed10335,00000001
03:27:51.200 PORT Tx 13 40 02fffffd,00fffffd,0ed3ffff,14000000
03:27:51.200 PORT Rx 13 0 c0fffffd,00fffffd,0ed329ae,00000001
03:27:59.377 PORT Rx 15 40 02fffffd,00fffffd,0336ffff,14000000
03:27:59.377 PORT Tx 15 0 c0fffffd,00fffffd,03360ed2,00000001
03:27:59.377 FCPH read 15 40 02fffffd,00fffffd,d0000000,00000000,03360ed2
03:27:59.377 FCPH seq 15 28 22380000,03360ed2,0000052b,0000001c,00000000
03:28:00.468 PORT Rx 13 40 02fffffd,00fffffd,29afffff,14000000
03:28:00.468 PORT Tx 13 0 c0fffffd,00fffffd,29af0ed5,00000001
03:28:00.469 FCPH read 13 40 02fffffd,00fffffd,66000000,00000000,29af0ed5
03:28:00.469 FCPH seq 13 28 22380000,29af0ed5,0000052b,0000001c,00000000
03:28:01.197 FCPH write 15 40 00fffffd,00fffffd,00000000,00000000,00000000
03:28:01.197 FCPH seq 15 28 00300000,00000000,00000591,00020182,00000000
03:28:01.197 PORT Tx 15 40 02fffffd,00fffffd,0ed4ffff,14000000
03:28:01.198 PORT Rx 15 0 c0fffffd,00fffffd,0ed40338,00000001
03:28:09.380 PORT Rx 11 40 02fffffd,00fffffd,033affff,14000000
03:28:09.380 PORT Tx 11 0 c0fffffd,00fffffd,033a0ed6,00000001
03:28:09.380 FCPH read 11 40 02fffffd,00fffffd,d5000000,00000000,033a0ed6
03:28:09.380 FCPH seq 11 28 22380000,033a0ed6,0000052b,0000001c,00000000
Expected Output is like below.. I want data from particular line that can be anything(here from time onwords)
+------------+----+-----+----+---+----------------------------------------
|time |task|event|port|cmd|args
+------------+----+-----+----+---+----------------------------------------
|03:27:51.199|PORT|Rx |11 |0 |c0fffffd,00fffffd,0ed10335,00000001 |
|03:27:51.200|PORT|Tx |13 |40 |02fffffd,00fffffd,0ed3ffff,14000000 |
|03:27:51.200|PORT|Rx |13 |0 |c0fffffd,00fffffd,0ed329ae,00000001 |
|03:27:59.377|PORT|Rx |15 |40 |02fffffd,00fffffd,0336ffff,14000000 |
|03:27:59.377|PORT|Tx |15 |0 |c0fffffd,00fffffd,03360ed2,00000001 |
This is sample data I want data from line starts with
time task event port cmd args

scala> val serverData = spark.read.textFile("serverData.txt")
//get column names from raw data
scala> val schemaArr = serverData.filter(x=>x.matches("\\btime.*\\b")).collect.mkString.split("\\s+").toList
//get required data to transform
scala> val reqData = serverData.filter(x=>x.matches("^\\d.*"))
val df = reqData.map(x=>{
val cols = x.split("\\s+")
(cols(0),cols(1),cols(2),cols(3),cols(4),cols(5))
}).toDF(schemaArr:_*)
please try above regex to extract your data and go on. Hope this will help you!

Related

Zookeeper znode delete

echo mntr | nc localhost 2181
zk_version 3.4.6-78--1, built on 12/06/2018 12:30 GMT
zk_avg_latency 319
zk_max_latency 13406
zk_min_latency 0
zk_packets_received 1847226
zk_packets_sent 1782230
zk_num_alive_connections 437
zk_outstanding_requests 0
zk_server_state follower
zk_znode_count 3188127
zk_watch_count 21
zk_ephemerals_count 27
zk_approximate_data_size 651278666
zk_open_file_descriptor_count 473
zk_max_file_descriptor_count 4096
zk_fsync_threshold_exceed_count 1
zk_znode_count 3188127
For me zk_znode_count looks very high
Please can anyone help me to how to list all zonode.
After checking znodes details h ave to decide to delete on based on what criteria....?
It would be help full because of my cluster service are always in acive/active or standby/standby mode.
Thanks in Adv,

llvm-cov fails to generate report when run on cloud GitLab CI

I have been running the following llvm-cov report command (which ships as part of the Swift toolchains) in Docker images (swift:5.1) on various environments.
BINARY_PATH="..."
PROF_DATA_PATH="..."
IGNORE_FILENAME_REGEX="..."
llvm-cov report \
$BINARY_PATH \
--format=text \
-instr-profile="$PROF_DATA_PATH" \
-ignore-filename-regex="$IGNORE_FILENAME_REGEX"
When the docker image is hosted on any machine aside from GitLab's cloud docker runners, I get the expected code coverage output:
Filename Regions Missed Regions Cover Functions Missed Functions Executed Lines Missed Lines Cover
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
ChannelHandlers/RedisByteDecoder.swift 5 0 100.00% 3 0 100.00% 10 0 100.00%
ChannelHandlers/RedisCommandHandler.swift 15 5 66.67% 8 3 62.50% 45 11 75.56%
ChannelHandlers/RedisMessageEncoder.swift 3 1 66.67% 3 1 66.67% 13 6 53.85%
Commands/BasicCommands.swift 28 4 85.71% 16 2 87.50% 99 7 92.93%
Commands/HashCommands.swift 38 4 89.47% 29 1 96.55% 156 1 99.36%
Commands/ListCommands.swift 56 8 85.71% 48 5 89.58% 217 11 94.93%
Commands/SetCommands.swift 46 12 73.91% 30 4 86.67% 147 4 97.28%
Commands/SortedSetCommands.swift 172 19 88.95% 105 6 94.29% 555 18 96.76%
Commands/StringCommands.swift 23 2 91.30% 21 1 95.24% 100 1 99.00%
Extensions/StandardLibrary.swift 10 2 80.00% 6 1 83.33% 21 1 95.24%
Extensions/SwiftNIO.swift 9 1 88.89% 7 0 100.00% 38 1 97.37%
RESP/RESPTranslator.swift 69 7 89.86% 10 2 80.00% 172 10 94.19%
RESP/RESPValue.swift 39 11 71.79% 14 3 78.57% 69 17 75.36%
RESP/RESPValueConvertible.swift 52 19 63.46% 15 3 80.00% 99 22 77.78%
RedisClient.swift 2 0 100.00% 2 0 100.00% 7 0 100.00%
RedisConnection.swift 72 23 68.06% 47 10 78.72% 228 31 86.40%
RedisErrors.swift 12 4 66.67% 6 1 83.33% 23 3 86.96%
RedisKey.swift 15 9 40.00% 12 6 50.00% 38 20 47.37%
RedisMetrics.swift 9 2 77.78% 9 2 77.78% 23 2 91.30%
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL 675 133 80.30% 391 51 86.96% 2060 166 91.94%
However, when the same docker images running the same commands, are hosted with GitLab's cloud runners:
Filename Regions Missed Regions Cover Functions Missed Functions Executed Lines Missed Lines Cover
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL 0 0 - 0 0 - 0 0 -
I'm making sure that the code coverage data is provided correctly by the Swift Package Manager through ls -l and in every environment (including GitLab CI) I get:
Profdata: -rw-r--r--. 1 root root 575608 Feb 8 19:51 .build/x86_64-unknown-linux/debug/codecov/default.profdata
Test binary: -rwxr-xr-x. 1 root root 16309424 Feb 8 19:51 .build/x86_64-unknown-linux/debug/redi-stackPackageTests.xctest
This also happens with LLVM-8 and LLVM-9 (LLVM-7 ships w/ Swift 5.1)
For the life of me, I can't figure out why.
Environments I've tested (all running Docker Engine 19+):
+----------------+-----------------+----------------+-------------------+
| | | | |
| HOST | OS | CPU | Generates Report? |
| | | | |
+-----------------------------------------------------------------------+
| iMac 2011 | High Sierra | sandybridge | YES |
+-----------------------------------------------------------------------+
| MBP 2019 | Catalina | skylake | YES |
+-----------------------------------------------------------------------+
| mac mini 2018 | Catalina | skylake | YES |
+-----------------------------------------------------------------------+
| GitHub Actions | 'ubuntu|latest' | skylake|avx512 | YES |
+-----------------------------------------------------------------------+
| GitLab CI | 'tags: docker' | haswell | NO |
+----------------+-----------------+----------------+-------------------+
Relevant bug reports:
Swift
GitLab

RISC V manual confusion: instruction format VS immediate format

I have some question related the RISC V manual
It has different types of instruction encoding such as R-type,I-type.
Just like the MIPS encoding.
* R-type
31 25 24 20 19 15 14 12 11 7 6 0
+------------+---------+---------+------+---------+-------------+
| funct7 | rs2 | rs1 |funct3| rd | opcode |
+------------+---------+---------+------+---------+-------------+
* I-type
31 20 19 15 14 12 11 7 6 0
+----------------------+---------+------+---------+-------------+
| imm | rs1 |funct3| rd | opcode |
+----------------------+---------+------+---------+-------------+
* S-type
31 25 24 20 19 15 14 12 11 7 6 0
+------------+---------+---------+------+---------+-------------+
| imm | rs2 | rs1 |funct3| imm | opcode |
+------------+---------+---------+------+---------+-------------+
* U-type
31 11 7 6 0
+---------------------------------------+---------+-------------+
| imm | rd | opcode |
+---------------------------------------+---------+-------------+
But it also have something called immediate format:
such as I-immediate, S-immediate and so on
* I-immediate
31 10 5 4 1 0
+-----------------------------------------+-----------+-------+--+
| <-- 31 | 30:25 | 24:21 |20|
+-----------------------------------------+-----------+-------+--+
* S-immediate
31 10 5 4 1 0
+-----------------------------------------+-----------+-------+--+
| <-- 31 | 30:25 | 11:8 |7 |
+-----------------------------------------+-----------+-------+--+
* B-immediate
31 12 11 10 5 4 1 0
+--------------------------------------+--+-----------+-------+--+
| <-- 31 |7 | 30:25 | 11:8 |z |
+--------------------------------------+--+-----------+-------+--+
* U-immediate
31 30 20 19 12 11 0
+--+-------------------+---------------+-------------------------+
|31| 30:20 | 19:12 | <-- z |
+--+-------------------+---------------+-------------------------+
* J-immediate
31 20 19 12 11 10 5 4 1 0
+----------------------+---------------+--+-----------+-------+--+
| <-- 31 | 19:12 |20| 30:25 | 24:21 |z |
+----------------------+---------------+--+-----------+-------+--+
According to the manual, it say those immediate is produced by RISC-V instruction but how are the things related?
What is the point to have immediate format?
The 2nd set of diagrams is showing you how the immediate bits are concatenated and sign-extended into a 32-bit integer (so they can work as a source operand for normal 32-bit ALU instructions like addi which need both their inputs to be the same size).
For I-type instructions it's trivial, just arithmetic right-shift the instruction word by 20 bits, because there's only one immediate field, and it's contiguous at the top of the instruction word.
For S-type immediate instructions, there are two separate fields in the instruction word: [31:25] and [11:7], and this shows you that they're in that order, not [11:7, 31:25] and not with any implicit zeros between them.
B-type immediate instructions apparently put bit 7 in front of [30:25], and the low bit is an implicit zero. (So the resulting number is always even). I assume B-type is for branches.
U-type is also interesting, padding the 20-bit immediate with trailing zeros. It's used for lui to create the upper bits of 32-bit constants (with addi supplying the rest). It's not a coincidence that U-type and I-type together have 32 total immediate bits.
To access static data, lui can create the high part of an address while lw can supply the low part directly, instead of using an addi to create the full address in a register. This is typical for RISC ISAs like MIPS and PowerPC as well (see an example on the Godbolt compiler explorer). But unlike most other RISC ISAs, RISC-V has auipc which adds the U-type immediate to the program counter, for efficient PIC without having to load addresses from a GOT (global offset table). (A recent MIPS revision also added an add-to-PC instruction, but for a long time MIPS was quite bad at PIC).
lui can encode any 4k-aligned address, i.e. a page-start address with 4k pages.

Touchscreen on Raspberry Pi emits click not touch

i folowed this link to calibrate touchscreen: http://www.circuitbasics.com/raspberry-pi-touchscreen-calibration-screen-rotation/.
ls -la /dev/input/
total 0
drwxr-xr-x 4 root root 240 Jul 12 18:38 .
drwxr-xr-x 15 root root 3460 Jul 12 18:38 ..
drwxr-xr-x 2 root root 140 Jul 12 18:38 by-id
drwxr-xr-x 2 root root 140 Jul 12 18:38 by-path
crw-rw---- 1 root input 13, 64 Jul 12 18:38 event0
crw-rw---- 1 root input 13, 65 Jul 12 18:38 event1
crw-rw---- 1 root input 13, 66 Jul 12 18:38 event2
crw-rw---- 1 root input 13, 67 Jul 12 18:38 event3
crw-rw---- 1 root input 13, 68 Jul 12 18:38 event4
crw-rw---- 1 root input 13, 63 Jul 12 18:38 mice
crw-rw---- 1 root input 13, 32 Jul 12 18:38 mouse0
crw-rw---- 1 root input 13, 33 Jul 12 18:38 mouse1
root#raspberrypi:/sys/devices/virtual/input# cat input4/uevent
PRODUCT=0/0/0/0
NAME="FT5406 memory based driver"
PROP=2
EV=b
KEY=400 0 0 0 0 0 0 0 0 0 0
ABS=2608000 3
MODALIAS=input:b0000v0000p0000e0000-e0,1,3,k14A,ra0,1,2F,35,36,39,mlsfw
root#raspberrypi:~# cat /etc/ts.conf
# Uncomment if you wish to use the linux input layer event interface
module_raw input
# Uncomment if you're using a Sharp Zaurus SL-5500/SL-5000d
# module_raw collie
# Uncomment if you're using a Sharp Zaurus SL-C700/C750/C760/C860
# module_raw corgi
# Uncomment if you're using a device with a UCB1200/1300/1400 TS interface
# module_raw ucb1x00
# Uncomment if you're using an HP iPaq h3600 or similar
# module_raw h3600
# Uncomment if you're using a Hitachi Webpad
# module_raw mk712
# Uncomment if you're using an IBM Arctic II
# module_raw arctic2
module pthres pmin=1
module variance delta=30
module dejitter delta=100
module linear
I only get response when configuring X with xinput_calibrator. When i enter this command
sudo TSLIB_FBDEVICE=/dev/fb0 TSLIB_TSDEVICE=/dev/input/event1 ts_calibrate
I get optput
xres = 800, yres = 480
selected device is not a touchscreen I understand
Can someone please help me,
Thanks in advance.
I don't have a solution for this, but I believe that it is related to the problem of touches being treated as mouseovers. This bug has been reported several times, but never actually fixed
https://gitlab.gnome.org/GNOME/gtk/-/issues/945
https://bugzilla.gnome.org/show_bug.cgi?id=789041
https://bugs.launchpad.net/ubuntu-mate/+bug/1792787
A bugzilla.gnome.org user named niteshgupta16 created a script that solves this problem, but it was uploaded to pasting/sharing service called hastebin at https://www.hastebin.com/uwuviteyeb.py.
Hastebin deletes files that have not been accessed within 30 days. Since hastebin is a javascript-obfuscated service, this file is not available on archive.org.
I am unable to find an email for niteshgupta16 in order to ask him if he still has uwuviteyeb.py.

TimeGrouper, pandas

I use TimeGrouper from pandas.tseries.resample to sum monthly return to 6M as follows:
6m_return = monthly_return.groupby(TimeGrouper(freq='6M')).aggregate(numpy.sum)
where monthly_return is like:
2008-07-01 0.003626
2008-08-01 0.001373
2008-09-01 0.040192
2008-10-01 0.027794
2008-11-01 0.012590
2008-12-01 0.026394
2009-01-01 0.008564
2009-02-01 0.007714
2009-03-01 -0.019727
2009-04-01 0.008888
2009-05-01 0.039801
2009-06-01 0.010042
2009-07-01 0.020971
2009-08-01 0.011926
2009-09-01 0.024998
2009-10-01 0.005213
2009-11-01 0.016804
2009-12-01 0.020724
2010-01-01 0.006322
2010-02-01 0.008971
2010-03-01 0.003911
2010-04-01 0.013928
2010-05-01 0.004640
2010-06-01 0.000744
2010-07-01 0.004697
2010-08-01 0.002553
2010-09-01 0.002770
2010-10-01 0.002834
2010-11-01 0.002157
2010-12-01 0.001034
The 6m_return is like:
2008-07-31 0.003626
2009-01-31 0.116907
2009-07-31 0.067688
2010-01-31 0.085986
2010-07-31 0.036890
2011-01-31 0.015283
However I want to get the 6m_return starting 6m from 7/2008 like the following:
2008-12-31 ...
2009-06-31 ...
2009-12-31 ...
2010-06-31 ...
2010-12-31 ...
Tried the different input options (i.e. loffset) in TimeGrouper but doesn't work.
Any suggestion will be really appreciated!
The problem can be solved by adding closed = 'left'
df.groupby(pd.TimeGrouper('6M', closed = 'left')).aggregate(numpy.sum)
TimeGrouper that is suggested in other answers is deprecated and will be removed from Pandas. It is replaced with Grouper. So a solution to your question using Grouper is:
df.groupby(pd.Grouper(freq='6M', closed='left')).aggregate(numpy.sum)
This is a workaround for what seems a bug, but give it a try and see if it works for you.
In [121]: ts = pandas.date_range('7/1/2008', periods=30, freq='MS')
In [122]: df = pandas.DataFrame(pandas.Series(range(len(ts)), index=ts))
In [124]: df[0] += 1
In [125]: df
Out[125]:
0
2008-07-01 1
2008-08-01 2
2008-09-01 3
2008-10-01 4
2008-11-01 5
2008-12-01 6
2009-01-01 7
2009-02-01 8
2009-03-01 9
2009-04-01 10
2009-05-01 11
2009-06-01 12
2009-07-01 13
2009-08-01 14
2009-09-01 15
2009-10-01 16
2009-11-01 17
2009-12-01 18
2010-01-01 19
2010-02-01 20
2010-03-01 21
2010-04-01 22
2010-05-01 23
2010-06-01 24
2010-07-01 25
2010-08-01 26
2010-09-01 27
2010-10-01 28
2010-11-01 29
2010-12-01 30
I've used integers to help confirm that the sums are correct. The workaround that seems to work is to add a month to the front of the dataframe to trick the TimeGrouper into doing what you need.
In [127]: df2 = pandas.DataFrame([0], index = [df.index.shift(-1, freq='MS')[0]])
In [129]: df2.append(df).groupby(pandas.TimeGrouper(freq='6M')).aggregate(numpy.sum)[1:]
Out[129]:
0
2008-12-31 21
2009-06-30 57
2009-12-31 93
2010-06-30 129
2010-12-31 165
Note the final [1:] is there to trim off the first group.