Multiline regex for a Ruby on Rails log file - sed

I work with a Ruby on Rails log file which looks like the following example:
[...]
Started GET "/staff/sign_in" for 22.22.22.22 at 2014-02-16 03:39:32 -0800
Processing by Staffer::SessionsController#new as */*
Rendered layouts/_compatible_browsers.html.erb (0.9ms)
Rendered layouts/headers/_guest.html.erb (0.6ms)
Cache digest for layouts/social_media_footer_link.html: bc9b2db49cc435f550be0f0dffe79548
Cache digest for layouts/_footer.html: 87dcaa136f1edad80dd5eb5a4b5dde82
Read fragment views/staffer_footer/87dcaa136f1edad80dd5eb5a4b5dde82/87dcaa136f1edad80dd5eb5a4b5dde82 (4.6ms)
Rendered layouts/_footer.html.erb (7.4ms)
Completed 200 OK in 527ms (Views: 445.2ms | ActiveRecord: 1.0ms)
Started GET "/staff/sign_in" for 22.22.22.22 at 2014-02-16 03:49:32 -0800
Started GET "/staff/sign_in" for 22.22.22.22 at 2014-02-16 03:59:32 -0800
Processing by Staffer::SessionsController#new as */*
Rendered layouts/_compatible_browsers.html.erb (0.9ms)
Rendered layouts/headers/_guest.html.erb (0.6ms)
Cache digest for layouts/social_media_footer_link.html: bc9b2db49cc435f550be0f0dffe79548
Read fragment views/staffer_footer/87dcaa136f1edad80dd5eb5a4b5dde82/87dcaa136f1edad80dd5eb5a4b5dde82 (4.6ms)
Rendered layouts/_footer.html.erb (7.4ms)
Completed 200 OK in 527ms (Views: 445.2ms | ActiveRecord: 1.0ms)
[...]
How can I tell sed to give me the following output for that given example?
Started GET "/staff/sign_in" for 22.22.22.22 at 2014-02-16 03:39:32 -0800; Completed 200 OK in 527ms (Views: 445.2ms | ActiveRecord: 1.0ms)
Started GET "/staff/sign_in" for 22.22.22.22 at 2014-02-16 03:59:32 -0800; Completed 200 OK in 527ms (Views: 445.2ms | ActiveRecord: 1.0ms)

This might work for you (GNU sed):
sed -n '/Started GET/{h;d};/Completed 200 OK/{H;g;s/\n/; /p}' file
Better syntax:
sed -n '/Started GET/{h;d;};/Completed 200 OK/{H;g;s/\n/; /p;}' file

Related

Why wget is getting just an empty folder?

I am trying to mirror a site index similar to
https://index.gd.workers.dev/
using wget. I run this code
wget -e robots=off --content-on-error --mirror -np -R "index.html*" https://index.gd.workers.dev/
It's giving this output:
--2020-03-11 01:20:05-- http://index.gd.workers.dev/ Resolving index.gd.workers.dev (index.gd.workers.dev)... 104.31.87.133,
104.31.86.133, 2606:4700:3035::681f:5785, ... Connecting to index.gd.workers.dev (index.gd.workers.dev)|104.31.87.133|:80...
connected. HTTP request sent, awaiting response... 200 OK Length: 361
[text/html] Saving to: ‘index.gd.workers.dev/index.html.tmp’
index.gd.workers.de 100%[===================>] 361 --.-KB/s in
0s
Last-modified header missing -- time-stamps turned off. 2020-03-11
01:20:05 (2.1 MB/s) - ‘index.gd.workers.dev/index.html.tmp’ saved
[361/361]
Removing index.gd.workers.dev/index.html.tmp since it should be
rejected.
FINISHED --2020-03-11 01:20:05-- Total wall clock time: 0.1s
Downloaded: 1 files, 361 in 0s (2.1 MB/s)
The end result is just a folder with the site name and nothing inside it. What am I doing wrong here? and is there any other way to mirror the directory?

Unicodedecode Error when pushing to CouchDB: 'utf8' codec can't decode byte 0xe9

I am using couchdb and couchapp on windows.
I'm working on an ongoing project of a professor https://github.com/Hypertopic/Tire-a-part. I'm currently trying to set up the app on my computer.
When I do:
couchapp push http://127.0.0.1:5984/tire-a-part
I get an error:
Traceback (most recent call last):
File "couchapp\dispatch.pyc", line 48, in dispatch
File "couchapp\dispatch.pyc", line 92, in _dispatch
File "couchapp\commands.pyc", line 79, in push
File "couchapp\localdoc.pyc", line 123, in push
File "couchapp\client.pyc", line 294, in save_doc
File "json\__init__.pyc", line 231, in dumps
File "json\encoder.pyc", line 201, in encode
File "json\encoder.pyc", line 264, in iterencode
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9 in position 1:
invalid continuation byte
My professor and my friends all have mac and don't have this problem.
After a few hours on the net trying to search for similar problems I understand that it is an encoding error but i don't understand what is not correctly encoded and what should I do.
Thanks
Edit: I have discovered the debug option of couchapp. It gives much more detail but i still don't really understand as it is my first time with couchapp and couchdb. This is the last part of the debug as i don't think the begining is important:
2018-04-14 12:42:16 [DEBUG] push spec/samples/scopus.bib
2018-04-14 12:42:16 [DEBUG] push spec/spec_helper.rb
2018-04-14 12:42:16 [DEBUG] Resource uri: http://127.0.0.1:5984/tire-a-part
2018-04-14 12:42:16 [DEBUG] Request: GET _design/Tire-a-part
2018-04-14 12:42:16 [DEBUG] Headers: {'Accept': 'application/json', 'User-
Agent': 'couchapp/0.7.5'}
2018-04-14 12:42:16 [DEBUG] Params: {}
2018-04-14 12:42:16 [DEBUG] Start to perform request: GET 127.0.0.1:5984
/tire-a-part/_design/Tire-a-part
2018-04-14 12:42:16 [DEBUG] Send headers: ['GET /tire-a-part/_design/Tire-a-
part HTTP/1.1\r\n', 'Host: 127.0.0.1:5984\r\n', 'User-Agent:
restkit/3.0.4\r\n', 'Accept-Encoding: identity\r\n', 'Accept:
application/json\r\n']
2018-04-14 12:42:16 [DEBUG] Start to parse response
2018-04-14 12:42:16 [DEBUG] Got response: 404 Object Not Found
2018-04-14 12:42:16 [DEBUG] headers: [MultiDict([('X-CouchDB-Body-
Time','0'),('X-Couch-Request-ID', '5ab9eee6cb'), ('Server', 'CouchDB/2.1.1
(Erlang OTP/18)'), ('Date', 'Sat, 14 Apr 2018 10:42:16 GMT'), ('Content-
Type','application/json'), ('Content-Length', '41'), ('Cache-Control',
'must-revalidate')])]
2018-04-14 12:42:16 [DEBUG] return response class
2018-04-14 12:42:16 [DEBUG] release connection
2018-04-14 12:42:16 [DEBUG] C:\Users\jules\Desktop\LO10 projet\Tire-a-
part\vendor don't exist
2018-04-14 12:42:16 [CRITICAL] 'utf8' codec can't decode byte 0xe9 in
position
1: invalid continuation byte
I compared this with what my friend got on mac and it is the exact same except for the [CRITICAL] line. after the 'vendordon't exist' couchapp put _design/Tire-a-part
It really seem it was an encoding error of 'é'. There were 2 files with 'é' in their name. After changing it to 'e' the push command work. The app don't work but that's a story for another day....
I don't have the answer, but I tried something: I started a Python3.5 command line, and declared a variable byte='\xe9' and then printed the variable with print(byte). As can be seen below, the 0xe9 byte looks like to be the é character:
$ python3.5
Python 3.5.2 (default, Nov 23 2017, 16:37:01)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> byte='\xe9'
>>> print(byte)
é
>>>
I'm not sure why Windows has problem with the é character but macOS works fine.
On Linux shell command-line, when I put the é character in a file and take a hex-dump of the file, I see that the é character is actually equal to c3 a9, notice that the 0a is the new-line or line-feed:
$ echo 'é' > file
$ cat file
é
$ hd file
00000000 c3 a9 0a |...|
00000003
Therefore, I think the problem is that the é character is encoded with one byte of 0xe9 rather than two bytes of 0xc3 0xa9.
I played around with Go to see where the e9 comes from and I notice that the Unicode for é is actually \u00e9 and it is corresponding the two bytes of \xc3\xa9 i.e. 0xc3 and 0xa9, as shown below. Therefore, on your Windows machine, somehow the Unicode is mixed up with the hexadecimal bytes.

Why does WWW::Mechanize fail with "X-Died: Illegal field name 'X-Meta-Twitter:title'"?

Why does WWW::Mechanize have blank content after getting the following URL? Using a browser or curl there is a full HTML page retrieved.
use WWW::Mechanize;
$mech = new WWW::Mechanize;
$mech->get("http://www.belizejudiciary.org/web/judgements2/");
print $mech->content # prints nothing
Here is the dump of the response:
HTTP/1.1 200 OK
Connection: close
Date: Fri, 10 Feb 2017 00:51:47 GMT
Server: Apache/2.4
Content-Type: text/html; charset=UTF-8
Client-Aborted: die
Client-Date: Fri, 10 Feb 2017 00:51:48 GMT
Client-Peer: 98.129.229.64:80
Client-Response-Num: 1
Client-Transfer-Encoding: chunked
Link: <http://www.belizejudiciary.org/web/wp-json/>; rel="https://api.w.org/"
Link: <http://www.belizejudiciary.org/web/?p=468>; rel=shortlink
Set-Cookie: X-Mapping-hepadkon=FAB86566672CEB74D66B2818CA030616; path=/
X-Died: Illegal field name 'X-Meta-Twitter:title' at /usr/local/lib/perl5/site_perl/5.16.3/sun4-solaris/HTML/HeadParser.pm line 207.
X-Pingback: http://www.belizejudiciary.org/web/xmlrpc.php
I have version 3.70 of HTML::Parser installed.
Your dump shows that there was an error parsing the response:
X-Died: Illegal field name 'X-Meta-Twitter:title' at /usr/local/lib/perl5/site_perl/5.16.3/sun4-solaris/HTML/HeadParser.pm line 207.
This is caused by a bug in HTML::HeadParser:
<meta> tags can have name attributes with colons in them, and this is perfectly valid. But HTML::HeadParser then tries to register these as X-Meta-<name> headers using HTTP::Headers. Newer versions of HTTP::Headers (since 6.05) have stricter checks for headers, and will refuse them if they contain colons.
This was fixed in version 3.71 of the HTML-Parser distribution, so you should upgrade.

How do I get a refresh token for command line gsutil to work?

I use gsutil to transfer files from a Windows machine to Google Cloud Storage.
I have not used it for more than 6 months and now when I try it I get:
Failure: invalid_grant
From researching this I suspect the access token is no longer valid as it has not been used for 6 months, and I need a refresh token?
I cannot seem to find how to get and use this.
thanks
Running gsutil -DD config produces the following output:
C:\Python27>python c:/gsutil/gsutil -DD config
DEBUG:boto:path=/pub/gsutil.tar.gz
DEBUG:boto:auth_path=/pub/gsutil.tar.gz
DEBUG:boto:Method: HEAD
DEBUG:boto:Path: /pub/gsutil.tar.gz
DEBUG:boto:Data:
DEBUG:boto:Headers: {}
DEBUG:boto:Host: storage.googleapis.com
DEBUG:boto:Params: {}
DEBUG:boto:establishing HTTPS connection: host=storage.googleapis.com, kwargs={'timeout': 70}
DEBUG:boto:Token: None
DEBUG:oauth2_client:GetAccessToken: checking cache for key *******************************
DEBUG:oauth2_client:FileSystemTokenCache.GetToken: key=******************************* not present (cache_file= c:\users\admini~1\appdata\local\temp\2\oauth2_client-tokencache._.ea******************************)
DEBUG:oauth2_client:GetAccessToken: token from cache: None
DEBUG:oauth2_client:GetAccessToken: fetching fresh access token...
INFO:oauth2client.client:Refreshing access_token connect: (accounts.google.com, 443)
send: 'POST /o/oauth2/token HTTP/1.1\r\nHost: accounts.google.com\r\nContent-Length: 177\r\ncontent-type: application/x- www-form-urlencoded\r\naccept-encoding: gzip, deflate\r\nuser-agent: Python-httplib2/0.7.7 (gzip)\r\n\r\nclient_secret=******************&grant_type=refresh_token&refresh_token=****************************************&client_ id=****************.apps.googleusercontent.com' reply: 'HTTP/1.1 400 Bad Request\r\n'
header: Content-Type: application/json; charset=utf-8 header: Cache-Control: no-cache, no-store, max-age=0, must-revalidate header: Pragma: no-cache header: Expires: Fri, 01 Jan 1990 00:00:00 GMT header: Date: Thu, 08 May 2014 02:02:21 GMT header: Content-Disposition: attachment; filename="json.txt"; filename*=UTF-8''json.txt header: Content-Encoding: gzip header: X-Content-Type-Options: nosniff header: X-Frame-Options: SAMEORIGIN
header: X-XSS-Protection: 1; mode=block header: Server: GSE header: Alternate-Protocol: 443:quic header: Transfer-Encoding: chunked
INFO:oauth2client.client:Failed to retrieve access token: { "error" : "invalid_grant" }
Traceback (most recent call last):
File "c:/gsutil/gsutil", line 83, in <module> gslib.__main__.main() File "c:\gsutil\gslib_main_.py", line 151, in main command_runner.RunNamedCommand('ver', ['-l'])
File "c:\gsutil\gslib\command_runner.py", line 95, in RunNamedCommand self._MaybeCheckForAndOfferSoftwareUpdate(command_name, debug)):
File "c:\gsutil\gslib\command_runner.py", line 181, in _MaybeCheckForAndOfferSoftwareUpdate cur_ver = LookUpGsutilVersion(suri_builder.StorageUri(GSUTIL_PUB_TARBALL))
File "c:\gsutil\gslib\util.py", line 299, in LookUpGsutilVersion obj = uri.get_key(False)
File "c:\gsutil\third_party\boto\boto\storage_uri.py", line 342, in get_key generation=self.generation)
File "c:\gsutil\third_party\boto\boto\gs\bucket.py", line 102, in get_key query_args_l=query_args_l)
File "c:\gsutil\third_party\boto\boto\s3\bucket.py", line 176, in _get_key_internal query_args=query_args)
File "c:\gsutil\third_party\boto\boto\s3\connection.py", line 547, in make_request retry_handler=retry_handler
File "c:\gsutil\third_party\boto\boto\connection.py", line 947, in make_request retry_handler=retry_handler)
File "c:\gsutil\third_party\boto\boto\connection.py", line 838, in _mexe request.authorize(connection=self)
File "c:\gsutil\third_party\boto\boto\connection.py", line 377, in authorize connection._auth_handler.add_auth(self, *********)
File "c:\gsutil\gslib\third_party\oauth2_plugin\oauth2_plugin.py", line 22, in add_auth self.oauth2_client.GetAuthorizationHeader()
File "c:\gsutil\gslib\third_party\oauth2_plugin\oauth2_client.py", line 338, in GetAuthorizationHeader return 'Bearer %s' % self.GetAccessToken().token
File "c:\gsutil\gslib\third_party\oauth2_plugin\oauth2_client.py", line 309, in GetAccessToken access_token = self.FetchAccessToken()
File "c:\gsutil\gslib\third_party\oauth2_plugin\oauth2_client.py", line 435, in FetchAccessToken credentials.refresh(http)
File "c:\gsutil\third_party\google-api-python-client\oauth2client\client.py", line 516, in refresh self._refresh(http.request)
File "c:\gsutil\third_party\google-api-python-client\oauth2client\client.py", line 653, in _refresh self._do_refresh_request(http_request)
File "c:\gsutil\third_party\google-api-python-client\oauth2client\client.py", line 710, in _do_refresh_request raise AccessTokenRefreshError(error_msg) oauth2client.client.AccessTokenRefreshError: invalid_grant
You can ask gsutil to configure itself. Go to the directory with gsutil and run this:
c:\gsutil> python gsutil config
Gsutil will lead you through the steps to setting up your credentials.
That said, access tokens only normally last about a half hour. It's more likely that the previously-configured refresh token was revoked for some reason. Alternately, you can only request new tokens at a certain rate. It's possible your account has been requesting many, many refresh tokens for some reason and has been temporarily rate limited by the access service.
The command to authenticate is now
$ gcloud auth login
That should refresh your grant and get you going again.
You may also want to run
$ gcloud components update
to update your installation.
Brandon Yarbrough gave me suggestions which solved this problem. He suspected that the .boto file was corrupted and suggested I delete it and run gsutil config again. I did this and it solved the problem.

Swift TempAuth returned 404 when HEAD the account

I am a newbie to Swift, but was trying to install it on my CentOS 6.5 VM. I have done
Installing lasted Swift release (1.12.0) and python-swiftclient (2.0.2) and their dependencies
Preparing and mounting my drive (a separated device formated as xfs) at /svr/node/d1
Creating the rings and adding the device to the rings (account, container, object)
Building the rings, which generates one .ring.gz file for each ring. Placed them in /etc/swift
Configuring hash_path_prefix for proxy
Setting up TempAuth and adding a new user 'myaccount:me' with password 'pa'
Starting proxy and account.
I would expect to successfully do
swift -A http://localhost:8080/auth/v1.0 -U myaccount:me -K pa stat
but the command told me 'Account not found'. To see detailed information, I did
swift --debug -v -A http://localhost:8080/auth/v1.0 -U myaccount:me -K pa stat
the output is
INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): localhost
DEBUG:requests.packages.urllib3.connectionpool:"GET /auth/v1.0 HTTP/1.1" 200 0
DEBUG:swiftclient:REQ: curl -i http://localhost:8080/auth/v1.0 -X GET
DEBUG:swiftclient:RESP STATUS: 200 OK
DEBUG:swiftclient:RESP HEADERS: [('content-length', '0'), ('x-trans-id', 'tx88b6b6b71ec14c3393248-00530de039'), ('x-auth-token', 'AUTH_tkdc7e842046e9469da324f2ec82c80a92'), ('x-storage-token', 'AUTH_tkdc7e842046e9469da324f2ec82c80a92'), ('date', 'Wed, 26 Feb 2014 12:38:17 GMT'), ('x-storage-url', 'http://localhost:8080/v1/AUTH_myaccount'), ('content-type', 'text/html; charset=UTF-8')]
INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): localhost
DEBUG:requests.packages.urllib3.connectionpool:"HEAD /v1/AUTH_myaccount HTTP/1.1" 404 0
INFO:swiftclient:REQ: curl -i http://localhost:8080/v1/AUTH_myaccount -I -H "X-Auth-Token: AUTH_tkdc7e842046e9469da324f2ec82c80a92"
INFO:swiftclient:RESP STATUS: 404 Not Found
INFO:swiftclient:RESP HEADERS: [('date', 'Wed, 26 Feb 2014 12:38:17 GMT'), ('content-length', '0'), ('content-type', 'text/html; charset=UTF-8'), ('x-trans-id', 'tx553c40e63c69470e9d146-00530de039')]
ERROR:swiftclient:Account HEAD failed: http://localhost:8080:8080/v1/AUTH_myaccount 404 Not Found
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/swiftclient/client.py", line 1192, in _retry
rv = func(self.url, self.token, *args, **kwargs)
File "/usr/lib/python2.6/site-packages/swiftclient/client.py", line 469, in head_account
http_response_content=body)
ClientException: Account HEAD failed: http://localhost:8080:8080/v1/AUTH_myaccount 404 Not Found
Account not found
I figured out my self: in proxy-server.conf, add these two lines
allow_account_management = true
account_autocreate = true