If I have a text file with the following messages, how can I strip out the value after "55=" and "48=" for each message, and populate those values into a new template message where the "55=" and "48=" are from each message in the file?
Source file:
Recv: 8=FIX.4.2|9=00258|35=d|49=XXXX|56=XXXX|34=4|52=20151026-16:04:21.972|320=1|322=1:2|323=4|393=43389|15=USD|55=H8X|167=FUT|200=201603|541=20160301|205=1|48=17888384001387914931|207=CME|100=XCME|16552=1|16554=20|454=3|455=HW|456=99|455=H8XH6|456=98|455=H8X Mar16|456=97|10=223|
Recv: 8=FIX.4.2|9=00249|35=d|49=XXXX|56=XXXX|34=5|52=20151026-16:04:21.972|320=1|322=1:3|323=4|393=43389|15=USD|55=BQS|167=FUT|200=201601|541=20160101|205=1|48=17875701615154475972|207=CME|100=XNYM|16552=1|454=3|455=09|456=99|455=BQSF6|456=98|455=BQS Jan16|456=97|10=065|
Recv: 8=FIX.4.2|9=00264|35=d|49=xxx|56=xxx|34=6|52=20151026-16:04:21.972|320=1|322=1:4|323=4|393=43389|15=USD|55=T1A|167=FUT|200=201512|541=20151201|205=1|48=17665452254677820169|207=CME|100=XCBT|16552=0.001|16554=2000|454=3|455=06|456=99|455=T1AZ5|456=98|455=T1A Dec15|456=97|10=141|
Recv: 8=FIX.4.2|9=00268|35=d|49=xxxx|56=xxxx|34=7|52=20151026-16:04:21.972|320=1|322=1:5|323=4|393=43389|15=USD|55=T03|167=FUT|200=201511|541=20151101|205=1|48=17593596635603008360|207=CME|100=XNYM|16460=50|16552=5|16554=0.4|454=3|455=08|456=99|455=T03X5|456=98|455=T03 Nov15|456=97|10=088|
Systemically create the same template below with new 55= and 48 values from each message in the source file?
8=FIX.4.2|9=195|35=V|49=XXXX|56=XXXX|34=2|52=20151027-21:05:58|262=1|263=1|264=0|265=1|266=Y|267=5|269=0|269=1|269=2|269=4|269=5|146=1|55=HW|48=17888384001387914931|167=FUT|207=CME|10=247|
8=FIX.4.2|9=195|35=V|49=XXXX|56=XXXX|34=2|52=20151027-21:05:58|262=1|263=1|264=0|265=1|266=Y|267=5|269=0|269=1|269=2|269=4|269=5|146=1|55=BQS|48=17875701615154475972|167=FUT|207=CME|10=247|
see if this grep line helps you:
grep -oE '(48|55)=[^|]*' file
With your text input, the line above gives:
55=H8X
48=17888384001387914931
55=HW
55=H8XH6
55=H8X Mar16
55=BQS
48=17875701615154475972
55=09
55=BQSF6
55=BQS Jan16
55=T1A
48=17665452254677820169
55=06
55=T1AZ5
55=T1A Dec15
55=T03
48=17593596635603008360
55=08
55=T03X5
55=T03 Nov15
55= and 48 values from each message in the source file?
55=HW
48=17888384001387914931
55=BQS
48=17875701615154475972
Related
I am trying to run the chatterbot's TwitterTrainer on a separate program like so:
from chatterbot import ChatBot
from chatterbot.trainers import TwitterTrainer
from settings import TWITTER
import logging
# Comment out the following line to disable verbose logging
logging.basicConfig(level=logging.INFO)
chatbot = ChatBot("TwitterBot",
logic_adapters=[
"chatterbot.logic.BestMatch"
],
input_adapter="chatterbot.input.TerminalAdapter",
output_adapter="chatterbot.output.TerminalAdapter",
database="./twitter-database.db",
twitter_consumer_key=TWITTER["CONSUMER_KEY"],
twitter_consumer_secret=TWITTER["CONSUMER_SECRET"],
twitter_access_token_key=TWITTER["ACCESS_TOKEN"],
twitter_access_token_secret=TWITTER["ACCESS_TOKEN_SECRET"],
trainer="chatterbot.trainers.TwitterTrainer",
random_seed_word="random"
)
chatbot.train()
chatbot.logger.info('Trained database generated successfully!')
And i get errors that look like that:
File "C:\Python27\lib\json\decoder.py", line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "C:\Python27\lib\json\decoder.py", line 380, in raw_decode
obj, end = self.scan_once(s, idx) UnicodeDecodeError: 'utf8' codec can't decode byte 0x85 in position 94: invalid start byte
This program doesn't run more than 3 seconds straight, but some tweets are written to the twitter-database.db until exception occurs.
Also when looking at the trainer.py i saw this:
# TODO: Handle non-ascii characters properly
Any ideas about why this happens and how can i fix this?
Could you try to add Python Source Code Encoding top of your file # -*- coding: utf-8 -*-. These type error will occurs due to this. More information available here http://chatterbot.readthedocs.io/en/stable/encoding.html#fixing-encoding-errors
I have to read 3 different lines from log files based on some text and then output the fields in a csv file.
sample log data:-
20110607 095826 [.] !! Begin test. Script filename/text.txt
20110607 095826 [.] Full path: filename/test/text.txt
20110607 095828 [.] FAILED: Test Failed()..
i have to read file name after !!Begin test. Script. this is my conf file
filter{
grok
{
match => {"message" => "%{BASE10NUM:Date}%{SPACE:pat}{BASE10NUM:Number}%
{SPACE:pat}[.]%{SPACE:pat}%{SPACE:pat}!! Begin test. Script%
{SPACE:pat}%{GREEDYDATA:file}"
}
overwrite => ["message"]
}
if "_grokparserfailure" in [tags]
{
drop{}
}
}
but its not giving me single record, its parsing full log file in json format no parsed field.
I met an error when running codes at the bottom. It's like a simple ftp.
I use python2.6.6 and CentOS release 6.8
In most linux server, it gets right results like this:(I'm very sorry that I have just sign up and couldn't )
Clinet:
[root#Test ftp]# python client.py
path:put|/home/aaa.txt
Server:
[root#Test ftp]# python server.py
connected...
pre_data:put|aaa.txt|4
cmd: put
file_name: aaa.txt
file_size: 4
upload successed.
But I get errors in some server(such as my own VM in my PC). I have done lots of tests(python2.6/python2.7, Centos6.5/Centos6.7) and found this error is not because them. Here is the error imformation:
[root#Lewis-VM ftp]# python server.py
connected...
pre_data:put|aaa.txt|7sdfsdf ###Here gets the wrong result, "sdfsdf" is the content of /home/aaa.txt and it shouldn't be sent here to 'file_size' and so it cause the "ValueError" below
cmd: put
file_name: aaa.txt
file_size: 7sdfsdf
----------------------------------------
Exception happened during processing of request from ('127.0.0.1', 10699)
Traceback (most recent call last):
File "/usr/lib64/python2.6/SocketServer.py", line 570, in process_request_thread
self.finish_request(request, client_address)
File "/usr/lib64/python2.6/SocketServer.py", line 332, in finish_request
self.RequestHandlerClass(request, client_address, self)
File "/usr/lib64/python2.6/SocketServer.py", line 627, in __init__
self.handle()
File "server.py", line 30, in handle
if int(file_size)>recv_size:
ValueError: invalid literal for int() with base 10: '7sdfsdf\n'
What's more, I found that if I insert a time.sleep(1) between sk.send(cmd+"|"+file_name+'|'+str(file_size)) and sk.send(data) in client.py, the error will disappear. I have said that I did tests in different system and python versions and the error is not because them. So I guess that is it because of some system configs? I have check about socket.send() and socket.recv() in python.org but fount nothing helpful. So could somebody help me to explain why this happend?
The code are here:
#!/usr/bin/env python
#coding:utf-8
################
#This is server#
################
import SocketServer
import os
class MyServer(SocketServer.BaseRequestHandler):
def handle(self):
base_path = '/home/ftp/file'
conn = self.request
print 'connected...'
while True:
#####receive pre_data: we should get data like 'put|/home/aaa|7'
pre_data = conn.recv(1024)
print 'pre_data:' + pre_data
cmd,file_name,file_size = pre_data.split('|')
print 'cmd: ' + cmd
print 'file_name: '+ file_name
print 'file_size: '+ file_size
recv_size = 0
file_dir = os.path.join(base_path,file_name)
f = file(file_dir,'wb')
Flag = True
####receive 1024bytes each time
while Flag:
if int(file_size)>recv_size:
data = conn.recv(1024)
recv_size+=len(data)
else:
recv_size = 0
Flag = False
continue
f.write(data)
print 'upload successed.'
f.close()
instance = SocketServer.ThreadingTCPServer(('127.0.0.1',9999),MyServer)
instance.serve_forever()
#!/usr/bin/env python
#coding:utf-8
################
#This is client#
################
import socket
import sys
import os
ip_port = ('127.0.0.1',9999)
sk = socket.socket()
sk.connect(ip_port)
while True:
input = raw_input('path:')
#####we should input like 'put|/home/aaa.txt'
cmd,path = input.split('|')
file_name = os.path.basename(path)
file_size=os.stat(path).st_size
sk.send(cmd+"|"+file_name+'|'+str(file_size))
send_size = 0
f= file(path,'rb')
Flag = True
#####read 1024 bytes and send it to server each time
while Flag:
if send_size + 1024 >file_size:
data = f.read(file_size-send_size)
Flag = False
else:
data = f.read(1024)
send_size+=1024
sk.send(data)
f.close()
sk.close()
The TCP is a stream of data. That is the problem. TCP do not need to keep message boundaries. So when a client calls something like
connection.send("0123456789")
connection.send("ABCDEFGHIJ")
then a naive server like
while True;
data = conn.recv(1024)
print data + "_"
may print any of:
0123456789_ABCDEFGHIJ_
0123456789ABCDEFGHIJ_
0_1_2_3_4_5_6_7_8_9_A_B_C_D_E_F_G_H_I_J_
The server has no chance to recognize how many sends client called because the TCP stack at client side just inserted data to a stream and the server must be able to process the data received in different number of buffers than the client used.
Your server must contain a logic to separate the header and the data. All of application protocols based on TCP use a mechanism to identify application level boundaries. For example HTTP separates headers and body by an empty line and it informs about the body length in a separate header.
Your program works correctly when server receives a header with the command, name and size in a separate buffer it it fails when client is fast enough and push the data into stream quickly and the server reads header and data in one chunk.
I'm using logstash(2.3.2) to read gz file by using gzip_lines codec.
The log file example (sample.log) is
127.0.0.2 - - [11/Dec/2013:00:01:45 -0800] "GET /xampp/status.php HTTP/1.1" 200 3891 "http://cadenza/xampp/navi.php" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:25.0) Gecko/20100101 Firefox/25.0"
The command I used to append to a gz file is:
cat sample.log | gzip -c >> s.gz
The logstash.conf is
input {
file {
path => "./logstash-2.3.2/bin/s.gz"
codec => gzip_lines { charset => "ISO-8859-1"}
}
}
filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
#match => { "message" => "message: %{GREEDYDATA}" }
}
#date {
# match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
#}
}
output {
stdout { codec => rubydebug }
}
I have installed gzip_line plugin with bin/logstash-plugin install logstash-codec-gzip_lines
start logstash with ./logstash -f logstash.conf
When I feed s.gz with
cat sample.log | gzip -c >> s.gz
I expect that the console prints the data. but there is nothing print out.
I have tried it on mac and ubuntu, and get same result.
Is anything wrong with my code?
I checked the code for gzip_lines and it seemed obvious to me that this plugin is not working. At least for version 2.3.2. May be it is outdated. Because it does not implement the methods specified here:
https://www.elastic.co/guide/en/logstash/2.3/_how_to_write_a_logstash_codec_plugin.html
So current internal working is like that:
file input plugin reads file line by line and send it to codec.
gzip_lines codec tryies to create a new GzipReader object with GzipReader.new(io)
It then go through the reader line by line to create events.
Because you specify a gzip file, file input plugin tries to read gzip file as a regular file and sends lines to codec. Codec tries to create a GzipReader with that string and it fails.
You can modify it to work like that:
Create a file that contains list of gzip files:
-- list.txt
/path/to/gzip/s.gz
Give it to file input plugin:
file {
path => "/path/to/list/list.txt"
codec => gzip_lines { charset => "ISO-8859-1"}
}
Changes are:
Open vendor/bundle/jruby/1.9/gems/logstash-codec-gzip_lines-2.0.4/lib/logstash/codecs/gzip_lines.r file. Add register method:
public
def register
#converter = LogStash::Util::Charset.new(#charset)
#converter.logger = #logger
end
And in method decode change:
#decoder = Zlib::GzipReader.new(data)
as
#decoder = Zlib::GzipReader.open(data)
The disadvantage of this approach is it wont tail your gzip file but the list file. So you will need to create a new gzip file and append it to list.
I had a variant of this problem where I needed to decode bytes in a files to an intermediate string to prepare for a process input that only accepts strings.
The fact that encoding / decoding issues were ignored in Pyhton 2 is actually really bad IMHO. you may end up with various corrupt data problems especially if you needed to re-encode the string back into data.
using ISO-8859-1 works for both gz and text files alike. while utf-8 only worked for text files. I haven't tried it for png's yet.
Here's an example of what worked for me
data = os.read(src, bytes_needed)
chunk += codecs.decode(data,'ISO-8859-1')
# do the needful with the chunk....
I want to use restful in my ci 3.03 application:
I found this tutplus tutorial
I downloaded codeigniter-restserver-master.zip file and copied Format.php and REST_Controller.php(#version 3.0.0) files into /application/libraries/REST directory
I created control application/controllers/api/Users.php :
require_once("application/libraries/REST/REST_Controller.php");
require_once("application/libraries/REST/Format.php");
class Users extends REST_Controller
{
//protected $rest_format = 'json';
function users_get()
{
//$users = $this->user_model->get_all();
$filter_username= $this->get('filter_username');
$filter_user_group= $this->get('filter_user_group');
$filter_active= $this->get('filter_active');
$sort= $this->get('sort');
$sort_direction= $this->get('sort_direction');
//, $filter_user_group, $filter_active, $sort, $sort_direction
$users_list = $this->muser->getUsersList(false, ''/*, $filter_username, $filter_user_group, $filter_active, $sort, $sort_direction, ''*/);
echo '<pre>'.count($users_list).'::$users_lists::'.print_r($users_list,true).'</pre>';
if($users_list)
{
$this->response($users, 200);
}
else
{
$this->response(NULL, 404);
}
}
AND RUNNING URL http://local-ci3.com/api/users I got many errors:
A PHP Error was encountered
Severity: Notice
Message: Undefined property: Users::$format
Filename: REST/REST_Controller.php
Line Number: 734
Backtrace:
File: /mnt/diskD_Work/wwwroot/ci3/application/libraries/REST/REST_Controller.php
Line: 734
Function: _error_handler
File: /mnt/diskD_Work/wwwroot/ci3/application/libraries/REST/REST_Controller.php
Line: 649
Function: response
File: /mnt/diskD_Work/wwwroot/ci3/index.php
Line: 292
Function: require_once
A PHP Error was encountered
Severity: Notice
Message: Undefined property: Users::$format
Filename: REST/REST_Controller.php
Line Number: 752
Backtrace:
File: /mnt/diskD_Work/wwwroot/ci3/application/libraries/REST/REST_Controller.php
Line: 752
Function: _error_handler
File: /mnt/diskD_Work/wwwroot/ci3/application/libraries/REST/REST_Controller.php
Line: 649
Function: response
File: /mnt/diskD_Work/wwwroot/ci3/index.php
Line: 292
Function: require_once
Actually I wanted to get some workable library to help me with REST api creation. I think that is preferable way istead of making from zero.
But is this library not workable or does it needs for some fixing? Sorry, what I missed is if this library only for ci 2?
I made search on this forum and found such hint :
I have the same problem when I load both Format.php and
Rest_Controller.php into a controller. After have a quick glance at
Format.php, it appears to be a standalone format conversion helper.
Try to just load Rest_Controller.php and see if your problem goes
away.
I commented line
//require_once("application/libraries/REST/Format.php");
in my controller, but I still get errors like :
Message: Undefined property: Users::$format.
I tried to review code of this library and see that invalid block when data are converted to json format, line 731-757 :
elseif ($data !== NULL)
{
// If the format method exists, call and return the output in that format
if (method_exists($this->format, 'to_' . $this->response->format))
{
// Set the format header
$this->output->set_content_type($this->_supported_formats[$this->response->format], strtolower($this->config->item('charset')));
$output = $this->format->factory($data)->{'to_' . $this->response->format}();
// An array must be parsed as a string, so as not to cause an array to string error
// Json is the most appropriate form for such a datatype
if ($this->response->format === 'array')
{
$output = $this->format->factory($output)->{'to_json'}();
}
}
else
{
// If an array or object, then parse as a json, so as to be a 'string'
if (is_array($data) || is_object($data))
{
$data = $this->format->factory($data)->{'to_json'}();
}
// Format is not supported, so output the raw data as a string
$output = $data;
}
}
If I tried to commented this block, but get error
Message: Array to string conversion
Looks like data are not converted in this case...
Is is possible to fix these errors?
Or can you, please, to tell me advice some codeigniter 3 REST api workable library with similar interface like library above?
Thanks!
I use that lib, work just fine. My suggestion is follow the more relevant installation instruction on github .
you also wrong place the lib file :
Tutorial say :
require(APPPATH'.libraries/REST_Controller.php');
You try :
require_once("application/libraries/REST/REST_Controller.php");
require_once("application/libraries/REST/Format.php");
No need to include the format because on line 407 the lib will load it. And also good to know on line 404 it will load the configuration (application/config/rest.php) it will be your default configuration, and also you can change it to suit your need.
Please let me know if you still got error using my answer :)