Flutter Dio - extraneous request to server - flutter

I am debugging http requests to our server and decided to try Dio dart package. After some trials (with no difference in results from standard http packages), I decided to stop using the Dio package.
I though happen to notice extraneous requests from random location (traced back to China telecom). Considering we are only trying to setup the server, and the requests started showing up only after I used Dio in my flutter app - Is DIO snooping on my server?
Seen on Server
X-Forwarded-Protocol: https
X-Real-Ip: 183.136.225.35
Host: 0.0.0.0:5002
Connection: close
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36 QIHU 360SE
Accept: */*
Referer: ******
Accept-Encoding: gzip
2022-10-06 15:06:06,768 [DEBUG] root:
X-Forwarded-Protocol: https
X-Real-Ip: 45.79.204.46
Host: 0.0.0.0:5002
Connection: close
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36
Accept: */*
Referer: *****
Accept-Encoding: gzip
Traceroute on IP
4 142 ms 141 ms 153 ms 116.119.68.60
5 140 ms 138 ms 140 ms be6391.rcr21.b015591-1.lon13.atlas.cogentco.com [149.14.224.161]
6 139 ms 139 ms 139 ms be2053.ccr41.lon13.atlas.cogentco.com [130.117.2.65]
7 144 ms 142 ms 142 ms 154.54.61.158
8 191 ms 190 ms 190 ms chinatelecom.demarc.cogentco.com [149.14.81.226]
9 299 ms * 299 ms 202.97.13.18
10 * 316 ms * 202.97.90.30
11 * 317 ms * 202.97.24.141
12 * * * Request timed out.
13 317 ms 308 ms 320 ms 220.191.200.166
14 334 ms 354 ms * 115.233.128.133
15 * * * Request timed out.
16 * * * Request timed out.
17 * * * Request timed out.
18 * * * Request timed out.
19 * * * Request timed out.
20 * * * Request timed out.
21 325 ms 325 ms 333 ms 183.136.225.35

Related

Troubleshooting connectivity from a pod in kubernetes

I have created a pod and service, called node-port.
root#hello-client:/# nslookup node-port
Server: 10.100.0.10
Address: 10.100.0.10#53
Name: node-port.default.svc.cluster.local
Address: 10.100.183.19
I can enter inside a pod and see the resolution happening.
However, the TCP connection is not happening from a node.
root#hello-client:/# curl --trace-ascii - http://node-port.default.svc.cluster.local:3050
== Info: Trying 10.100.183.19:3050...
What are the likely factors contributing to failure?
What are some suggestions to troubleshoot this?
On a working node/cluster. I expect this to work like this.
/ # curl --trace-ascii - node-port:3050
== Info: Trying 10.100.13.83:3050...
== Info: Connected to node-port (10.100.13.83) port 3050 (#0)
=> Send header, 78 bytes (0x4e)
0000: GET / HTTP/1.1
0010: Host: node-port:3050
0026: User-Agent: curl/7.83.1
003f: Accept: */*
004c:
== Info: Mark bundle as not supporting multiuse
<= Recv header, 17 bytes (0x11)
0000: HTTP/1.1 200 OK
<= Recv header, 38 bytes (0x26)
0000: Server: Werkzeug/2.2.2 Python/3.8.13
<= Recv header, 37 bytes (0x25)
0000: Date: Fri, 26 Aug 2022 04:34:48 GMT
<= Recv header, 32 bytes (0x20)
0000: Content-Type: application/json
<= Recv header, 20 bytes (0x14)
0000: Content-Length: 25
<= Recv header, 19 bytes (0x13)
0000: Connection: close
<= Recv header, 2 bytes (0x2)
0000:
<= Recv data, 25 bytes (0x19)
0000: {. "hello": "world".}.
{
"hello": "world"
}
== Info: Closing connection 0
/ #

Why am I getting an SSL socket timeout connecting to Keycloak?

I think this question can be rewritten to "Using Spring Boot and keycloak-spring-boot-starter, what happens after KeycloakSpringBootConfigResolver.resolve()?"
I have a custom keycloak config resolver:
public class CustomKeycloakConfigResolver
extends KeycloakSpringBootConfigResolver {
...
#Override
public KeycloakDeployment resolve(final HttpFacade.Request request) {
LOGGER.debug("-----------------------------------------------");
LOGGER.debug("Resolving Deployment for {}", request.getURI());
...
LOGGER.trace("---------- CREATING KEYCLOAK DEPLOYMENT ---------");
KeycloakDeployment keycloakDeployment =
KeycloakDeploymentBuilder.build(adapterConfig);
LOGGER.trace("---------- /CREATED KEYCLOAK DEPLOYMENT ---------");
return keycloakDeployment;
This pulls the KeycloakDeployment configuration from our database instead of from application.properties. Testing on a local docker swarm cluster, it works like a charm (but this is only on one machine, without SSL enabled, etc).
Pushing out to our QA environment (nginx on one machine with TLS termination, REST service on another, Keycloak on another, database provided by RDS), the generation of a KeycloakDeployment goes off without a hitch. This includes reaching out to .well-known/openid-configuration and resolving all of the Keycloak URLs, and printing both of the final trace() statements.
Almost immediately after (within 10-30ms) creating the KeycloakDeployment and returning it to the keycloak-spring-boot-starter framework, I receive a SocketTimeoutException exception. I can't tell from the exception what the system is trying to do when throws this exception, and I haven't been able to tell from https://github.com/keycloak/keycloak what the workflow is after the deployment is "resolved()".
So - what happens next?
A secured method is accessed
Spring Boot auth hands off to Keycloak auth
Keycloak auth generates a custom KeycloakDeployment
KeycloakDeployment is resolved - reaches out to Keycloak service and obtains the OIDC configuration
KeycloakDeployment is passed back to Keycloak auth framework
... Something happens that throws an immediate socket timeout exception...
Method is never invoked
How do I figure out what's happening in step 6? I find it hard to believe it's an actual socket timeout after only 20ms, and everything I think it should be accessing is up and responsive. But I'm willing to be wrong...
---- Original Question ----
I'm trying to get Keycloak working with a Spring Boot REST service behind an nginx proxy. TLS is terminated at the nginx server. Everything is in docker containers, but on separate machines (no kube, no swarm, etc).
Everything seems good. I can login to the master realm, create a new realm, add users, etc. However when the REST service tries to contact the Keycloak server (through the nginx proxy), I'm getting an SSL timeout error:
2022-03-22 18:59:13.384 INFO 26 --- [nio-8080-exec-3] o.keycloak.adapters.KeycloakDeployment : Loaded URLs from https://hostname/auth/realms/myrealm/.well-known/openid-configuration
javax.net.ssl|WARNING|18|http-nio-8080-exec-3|2022-03-22 18:59:13.409 UTC|SSLSocketImpl.java:1672|handling exception (
"throwable" : {
java.net.SocketTimeoutException: Read timed out
at java.base/sun.nio.ch.NioSocketImpl.timedRead(NioSocketImpl.java:283)
...
Thing is, this "timeout" happens almost instantaneously after the .well-known/openid-configuration response, so I'm skeptical it's even actually a timeout, unless the threshold is set to like 10ms by default or something.
I cranked up javax.net logging with System.setProperty("javax.net.debug", "ssl:all");, and I can't see anything obvious:
2022-03-22 18:59:13.384 INFO 26 --- [nio-8080-exec-3] o.keycloak.adapters.KeycloakDeployment : Loaded URLs from https://hostname/auth/realms/myrealm/.well-known/openid-configuration
javax.net.ssl|DEBUG|18|http-nio-8080-exec-3|2022-03-22 18:59:13.385 UTC|SSLSocketOutputRecord.java:331|WRITE: TLSv1.2 application_data, length = 11
javax.net.ssl|DEBUG|18|http-nio-8080-exec-3|2022-03-22 18:59:13.385 UTC|SSLCipher.java:1770|Plaintext before ENCRYPTION (
0000: 07 00 00 00 03 63 6F 6D 6D 69 74 .....commit
)
javax.net.ssl|DEBUG|18|http-nio-8080-exec-3|2022-03-22 18:59:13.388 UTC|SSLSocketOutputRecord.java:346|Raw write (
0000: 17 03 03 00 23 00 00 00 00 00 00 00 80 9C 76 46 ....#.........vF
0010: 44 FA F9 3A A4 9B A1 B2 D8 9B 6A 69 76 C7 1A 3D D..:......jiv..=
0020: 94 C4 40 D2 D8 F2 E4 7E ..#.....
)
javax.net.ssl|DEBUG|18|http-nio-8080-exec-3|2022-03-22 18:59:13.388 UTC|SSLSocketInputRecord.java:488|Raw read (
0000: 17 03 03 00 23 ....#
)
javax.net.ssl|DEBUG|18|http-nio-8080-exec-3|2022-03-22 18:59:13.388 UTC|SSLSocketInputRecord.java:214|READ: TLSv1.2 application_data, length = 35
javax.net.ssl|DEBUG|18|http-nio-8080-exec-3|2022-03-22 18:59:13.388 UTC|SSLSocketInputRecord.java:488|Raw read (
0000: 15 13 12 41 88 AD 18 6F B8 5E 25 90 9D BA 23 BF ...A...o.^%...#.
0010: B3 A5 A9 5E 61 FA 77 BD AE A4 C0 57 B2 1D 5B 18 ...^a.w....W..[.
0020: E8 C7 77 ..w
)
javax.net.ssl|DEBUG|18|http-nio-8080-exec-3|2022-03-22 18:59:13.388 UTC|SSLSocketInputRecord.java:247|READ: TLSv1.2 application_data, length = 35
javax.net.ssl|DEBUG|18|http-nio-8080-exec-3|2022-03-22 18:59:13.388 UTC|SSLCipher.java:1672|Plaintext after DECRYPTION (
0000: 07 00 00 01 00 00 00 00 00 00 00 ...........
)
javax.net.ssl|DEBUG|18|http-nio-8080-exec-3|2022-03-22 18:59:13.388 UTC|SSLSocketOutputRecord.java:331|WRITE: TLSv1.2 application_data, length = 21
javax.net.ssl|DEBUG|18|http-nio-8080-exec-3|2022-03-22 18:59:13.389 UTC|SSLCipher.java:1770|Plaintext before ENCRYPTION (
0000: 11 00 00 00 03 53 45 54 20 61 75 74 6F 63 6F 6D .....SET autocom
0010: 6D 69 74 3D 31 mit=1
)
javax.net.ssl|DEBUG|18|http-nio-8080-exec-3|2022-03-22 18:59:13.390 UTC|SSLSocketOutputRecord.java:346|Raw write (
0000: 17 03 03 00 2D 00 00 00 00 00 00 00 81 6B 87 27 ....-........k.'
0010: 9C 91 53 E2 F8 70 1C D4 FA F3 4A 79 1B B0 11 05 ..S..p....Jy....
0020: 13 3E 4F 10 A8 E8 43 B3 BB FA 1E 48 82 DF 59 25 .>O...C....H..Y%
0030: CF 9D ..
)
javax.net.ssl|DEBUG|18|http-nio-8080-exec-3|2022-03-22 18:59:13.392 UTC|SSLSocketInputRecord.java:488|Raw read (
0000: 17 03 03 00 23 ....#
)
javax.net.ssl|DEBUG|18|http-nio-8080-exec-3|2022-03-22 18:59:13.392 UTC|SSLSocketInputRecord.java:214|READ: TLSv1.2 application_data, length = 35
javax.net.ssl|DEBUG|18|http-nio-8080-exec-3|2022-03-22 18:59:13.392 UTC|SSLSocketInputRecord.java:488|Raw read (
0000: 15 13 12 41 88 AD 18 70 E3 20 7E 21 DA B0 24 28 ...A...p. .!..$(
0010: EF 6D EB BC 5C CE 5D 94 1D BC 04 BB F9 D1 3D 72 .m..\.].......=r
0020: 0C 71 83 .q.
)
javax.net.ssl|DEBUG|18|http-nio-8080-exec-3|2022-03-22 18:59:13.392 UTC|SSLSocketInputRecord.java:247|READ: TLSv1.2 application_data, length = 35
javax.net.ssl|DEBUG|18|http-nio-8080-exec-3|2022-03-22 18:59:13.393 UTC|SSLCipher.java:1672|Plaintext after DECRYPTION (
0000: 07 00 00 01 00 00 00 02 00 00 00 ...........
)
javax.net.ssl|WARNING|18|http-nio-8080-exec-3|2022-03-22 18:59:13.409 UTC|SSLSocketImpl.java:1672|handling exception (
"throwable" : {
java.net.SocketTimeoutException: Read timed out
at java.base/sun.nio.ch.NioSocketImpl.timedRead(NioSocketImpl.java:283)
...
So between the well-known configuration call and the final timeout exception:
18:59:13.384
18:59:13.409
all of 25ms passes by - seems hard to believe there would be a legitimate timeout exception thrown, but I can't seem to get any more clue as to what is causing the timeout. The keycloak service definitely IS reachable, and responds quite quickly.
Nothing at all in the Keycloak logs, and not terribly much in the nginx logs:
[--- AUTH ---] [22/Mar/2022:19:19:56 +0000] [200] "POST /auth/realms/myrealm/protocol/openid-connect/token HTTP/1.1" "https://web.mycompany.com/"
[--- AUTH ---] [22/Mar/2022:19:19:57 +0000] [200] "GET /auth/realms/myrealm/.well-known/openid-configuration HTTP/1.1" "-"
[--- AUTH ---] [22/Mar/2022:19:19:57 +0000] [200] "GET /auth/realms/myrealm/.well-known/openid-configuration HTTP/1.1" "-"
2022/03/22 19:19:57 [info] 73#73: *659 client #.#.#.# closed keepalive connection
[--- AUTH ---] [22/Mar/2022:19:19:57 +0000] [200] "GET /auth/realms/myrealm/.well-known/openid-configuration HTTP/1.1" "-"
[--- AUTH ---] [22/Mar/2022:19:19:57 +0000] [200] "GET /auth/realms/myrealm/.well-known/openid-configuration HTTP/1.1" "-"
[--- AUTH ---] [22/Mar/2022:19:19:58 +0000] [200] "GET /auth/realms/myrealm/.well-known/openid-configuration HTTP/1.1" "-"
[--- AUTH ---] [22/Mar/2022:19:19:58 +0000] [200] "GET /auth/realms/myrealm/.well-known/openid-configuration HTTP/1.1" "-"
2022/03/22 19:19:58 [info] 73#73: *670 client #.#.#.# closed keepalive connection
2022/03/22 19:19:58 [info] 73#73: *668 client #.#.#.# closed keepalive connection
[--- AUTH ---] [22/Mar/2022:19:19:58 +0000] [200] "GET /auth/realms/myrealm/.well-known/openid-configuration HTTP/1.1" "-"
[--- AUTH ---] [22/Mar/2022:19:19:58 +0000] [200] "GET /auth/realms/myrealm/protocol/openid-connect/certs HTTP/1.1" "-"
[--- AUTH ---] [22/Mar/2022:19:19:58 +0000] [200] "GET /auth/realms/myrealm/.well-known/openid-configuration HTTP/1.1" "-"
[--- AUTH ---] [22/Mar/2022:19:19:58 +0000] [200] "GET /auth/realms/myrealm/.well-known/openid-configuration HTTP/1.1" "-"
[--- AUTH ---] [22/Mar/2022:19:19:59 +0000] [200] "GET /auth/realms/myrealm/.well-known/openid-configuration HTTP/1.1" "-"
[--- AUTH ---] [22/Mar/2022:19:19:59 +0000] [200] "GET /auth/realms/myrealm/protocol/openid-connect/certs HTTP/1.1" "-"
2022/03/22 19:19:59 [info] 73#73: *677 client #.#.#.# closed keepalive connection
2022/03/22 19:19:59 [info] 73#73: *675 client #.#.#.# closed keepalive connection
[--- AUTH ---] [22/Mar/2022:19:19:59 +0000] [200] "GET /auth/realms/myrealm/.well-known/openid-configuration HTTP/1.1" "-"
[--- AUTH ---] [22/Mar/2022:19:19:59 +0000] [200] "GET /auth/realms/myrealm/.well-known/openid-configuration HTTP/1.1" "-"
[--- AUTH ---] [22/Mar/2022:19:19:59 +0000] [200] "GET /auth/realms/myrealm/.well-known/openid-configuration HTTP/1.1" "-"
[--- AUTH ---] [22/Mar/2022:19:20:00 +0000] [200] "GET /auth/realms/myrealm/.well-known/openid-configuration HTTP/1.1" "-"
2022/03/22 19:20:00 [info] 73#73: *664 client #.#.#.# closed keepalive connection
2022/03/22 19:20:00 [info] 73#73: *666 client #.#.#.# closed keepalive connection
2022/03/22 19:20:00 [info] 73#73: *672 client #.#.#.# closed keepalive connection
2022/03/22 19:20:00 [info] 73#73: *682 client #.#.#.# closed keepalive connection
2022/03/22 19:20:00 [info] 73#73: *684 client #.#.#.# closed keepalive connection
2022/03/22 19:20:00 [info] 74#74: *686 client #.#.#.# closed keepalive connection
[--- AUTH ---] [22/Mar/2022:19:20:00 +0000] [200] "GET /auth/realms/myrealm/.well-known/openid-configuration HTTP/1.1" "-"
[--- AUTH ---] [22/Mar/2022:19:20:00 +0000] [200] "GET /auth/realms/myrealm/.well-known/openid-configuration HTTP/1.1" "-"
[--- AUTH ---] [22/Mar/2022:19:20:01 +0000] [200] "GET /auth/realms/myrealm/.well-known/openid-configuration HTTP/1.1" "-"
[--- AUTH ---] [22/Mar/2022:19:20:01 +0000] [200] "GET /auth/realms/myrealm/protocol/openid-connect/certs HTTP/1.1" "-"
[--- REST ---] [22/Mar/2022:19:20:01 +0000] [401] "GET /api/v1/stuff HTTP/1.1" "https://web.mycompany.com/"
Any help as to what's going on?
Other notes:
I've gone through the Setting Up a load balancer or proxy instructions from Keycloak. Specifically, the nginx server has the following:
upstream AUTH {
server #.#.#.#:8080;
server #.#.#.#:8080;
}
...
proxy_set_header X-Forwarded-For $remote_addr;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header Host $http_host;
proxy_pass http://AUTH;
I think that covers bullet points 1 and 2.
Bullet point 3 is covered by the keycloak server. The standalone-ha.xml file has:
<subsystem xmlns="urn:jboss:domain:undertow:12.0" default-server="default-server" default-virtual-host="default-host" default-servlet-container="default" default-security-domain="other" statistics-enabled="true">
<buffer-cache name="default"/>
<server name="default-server">
<ajp-listener name="ajp" socket-binding="ajp"/>
<http-listener name="default" socket-binding="http" redirect-socket="https" proxy-address-forwarding="${env.PROXY_ADDRESS_FORWARDING:false}" enable-http2="true"/>
<https-listener name="https" socket-binding="https" ssl-context="applicationSSC" proxy-address-forwarding="${env.PROXY_ADDRESS_FORWARDING:false}" enable-http2="true"/>
<host name="default-host" alias="localhost">
<location name="/" handler="welcome-content"/>
<http-invoker http-authentication-factory="application-http-authentication"/>
</host>
</server>
and the container is launched with the environment variable
PROXY_ADDRESS_FORWARDING=true
I've verified with a false login that Keycloak is seeing the end user's IP address rather than the nginx server:
15:17:07,304 WARN [org.keycloak.events] (default task-14) type=LOGIN_ERROR, realmId=master, clientId=security-admin-console, userId=null, ipAddress=#.#.#.#, error=user_not_found, auth_method=openid-connect, auth_type=code, redirect_uri=https://host.name/auth/admin/master/console/#/realms/MyRealm/login-settings, code_id=..., username=foo, authSessionParentId=..., authSessionTabId=...
Where #.#.#.# is my IP address.

Py4JJavaError: An error occurred while calling o840.showString

I am trying to parse a log file with millions of records. It contains host name, timestamp, status code etc. After successfully parsing host and status code and url, when I am trying to parse timestamp, I am getting an error. following is my code:
lines=sc.textFile(filepath)
df_log= lines.map(lambda x: Row(header=x)).toDF()
timestamp_pattern= r'\[\d{2}\/\w{3}\/\d{4}\:\d{2}\:\d{2}\:\d{2}\s\S+\d{4}]'
df2=df_log.select(regexp_extract(col('header'),timestamp_pattern,1).alias("timestamp"))
everything is working fine till here. after this when I am trying to df2.show(10). I am getting following error:
Py4JJavaErrorTraceback (most recent call last) <ipython-input-112-5a86d13b2926> in <module>()
----> 1 df2.show(1)
/opt/cloudera/parcels/SPARK2/lib/spark2/python/pyspark/sql/dataframe.pyc in show(self, n, truncate)
316 """
317 if isinstance(truncate, bool) and truncate:
--> 318 print(self._jdf.showString(n, 20))
319 else:
320 print(self._jdf.showString(n, int(truncate)))
/opt/cloudera/parcels/SPARK2/lib/spark2/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py in __call__(self, *args) 1131 answer = self.gateway_client.send_command(command) 1132 return_value
= get_return_value(
-> 1133 answer, self.gateway_client, self.target_id, self.name) 1134 1135 for temp_arg in temp_args:
/opt/cloudera/parcels/SPARK2/lib/spark2/python/pyspark/sql/utils.pyc in deco(*a, **kw)
61 def deco(*a, **kw):
62 try:
---> 63 return f(*a, **kw)
64 except py4j.protocol.Py4JJavaError as e:
65 s = e.java_exception.toString()
/opt/cloudera/parcels/SPARK2/lib/spark2/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
317 raise Py4JJavaError(
318 "An error occurred while calling {0}{1}{2}.\n".
--> 319 format(target_id, ".", name), value)
320 else:
321 raise Py4JError(
Py4JJavaError: An error occurred while calling o965.showString. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 5, ip-20-0-31-210.ec2.internal, executor 2): java.lang.IndexOutOfBoundsException: No group 1 at java.util.regex.Matcher.group(Matcher.java:538) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:231) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:225) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:826) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:826) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:325) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1430) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1417) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1417) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:797) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:797) at scala.Option.foreach(Option.scala:257) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:797) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1645) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1600) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1589) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:623) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1930) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1943) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1956) at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:333) at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38) at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2378) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57) at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2772) at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2377) at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2384) at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2120) at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2119) at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2802) at org.apache.spark.sql.Dataset.head(Dataset.scala:2119) at org.apache.spark.sql.Dataset.take(Dataset.scala:2334) at org.apache.spark.sql.Dataset.showString(Dataset.scala:248) at sun.reflect.GeneratedMethodAccessor71.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:280) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.IndexOutOfBoundsException: No group 1 at java.util.regex.Matcher.group(Matcher.java:538) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:231) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:225) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:826) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:826) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:325) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ... 1 more
Please help me with this.
following is the excerpt of the records in the log file for referance:
109.169.248.247 - - [12/Dec/2015:18:25:11 +0100] GET /administrator/ HTTP/1.1 200 4263 - Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0 -
109.169.248.247 - - [12/Dec/2015:18:25:11 +0100] GET /administrator/ HTTP/1.1 200 4263 - Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0 -
109.169.248.247 - - [12/Dec/2015:18:25:11 +0100] GET /administrator/ HTTP/1.1 200 4263 - Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0 -
109.169.248.247 - - [12/Dec/2015:18:25:11 +0100] POST /administrator/index.php HTTP/1.1 200 4494 http://almhuette-raith.at/administrator/ Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0 -
46.72.177.4 - - [12/Dec/2015:18:31:08 +0100] GET /administrator/ HTTP/1.1 200 4263 - Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0 -
46.72.177.4 - - [12/Dec/2015:18:31:08 +0100] POST /administrator/index.php HTTP/1.1 200 4494 http://almhuette-raith.at/administrator/ Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0 -
83.167.113.100 - - [12/Dec/2015:18:31:25 +0100] GET /administrator/ HTTP/1.1 200 4263 - Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0 -
83.167.113.100 - - [12/Dec/2015:18:31:25 +0100] POST /administrator/index.php HTTP/1.1 200 4494 http://almhuette-raith.at/administrator/ Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0 -
95.29.198.15 - - [12/Dec/2015:18:32:10 +0100] GET /administrator/ HTTP/1.1 200 4263 - Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0 -
95.29.198.15 - - [12/Dec/2015:18:32:11 +0100] POST /administrator/index.php HTTP/1.1 200 4494 http://almhuette-raith.at/administrator/ Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0 -
109.184.11.34 - - [12/Dec/2015:18:32:56 +0100] GET /administrator/ HTTP/1.1 200 4263 - Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0 -
109.184.11.34 - - [12/Dec/2015:18:32:56 +0100] POST /administrator/index.php HTTP/1.1 200 4494 http://almhuette-raith.at/administrator/ Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0 -
`
change last line to -
timestamp_pattern= r'\[\d{2}\/\w{3}\/\d{4}\:\d{2}\:\d{2}\:\d{2}\s\S+\d{4}]'
df2=df_log.select(regexp_extract(col('header'),timestamp_pattern,0).alias("timestamp"))
note: I changed the groupId from 1 to 0 since there is no group set in your timestamp_pattern

Responding with a stream sometimes result in "Connection reset by peer" error

In our application we have routes that are streaming JSON documents. Here is an example:
/** GET api/1/tenant/(tenantId)/ads/ */
def getAllAdsByOwner(advertiserId: AdvertiserId): Route =
get {
httpRequiredSession { username =>
getAllTenantAds(username, advertiserId) { (adSource: Source[AdView, Any]) =>
complete(adSource)
}
}
}
Most of the time it works as expected, but sometimes, especially when there are many simultaneous requests, the server starts resetting connection just after the headers have been sent.
I tested with a script that requests this route with curl in a loop and aborting if the request failed. It was running for about 2 minutes before stopping. Trace when request fails is the following:
<= Recv header, 17 bytes (0x11)
0000: HTTP/1.1 200 OK
<= Recv header, 54 bytes (0x36)
0000: Access-Control-Allow-Origin: https://<...>
<= Recv header, 135 bytes (0x87)
0000: Access-Control-Expose-Headers: Content-Type, Authorization, Refr
0040: esh-Token, Set-Authorization, Set-Refresh-Token, asset-content-l
0080: ength
<= Recv header, 40 bytes (0x28)
0000: Access-Control-Allow-Credentials: true
<= Recv header, 24 bytes (0x18)
0000: Content-Encoding: gzip
<= Recv header, 23 bytes (0x17)
0000: X-Frame-Options: DENY
<= Recv header, 33 bytes (0x21)
0000: X-Content-Type-Options: nosniff
<= Recv header, 26 bytes (0x1a)
0000: Content-Security-Policy: .
<= Recv header, 20 bytes (0x14)
0000: default-src 'self';.
<= Recv header, 63 bytes (0x3f)
0000: style-src 'self' 'unsafe-inline' https://fonts.googleapis.com;.
<= Recv header, 59 bytes (0x3b)
0000: font-src 'self' 'unsafe-inline' https://fonts.gstatic.com;.
<= Recv header, 99 bytes (0x63)
0000: script-src 'self' 'unsafe-inline' 'unsafe-eval' https://*.google
0040: apis.com https://maps.gstatic.com;.
<= Recv header, 69 bytes (0x45)
0000: img-src 'self' data: https://*.googleapis.com https://*.gstatic.
0040: com;.
<= Recv header, 8 bytes (0x8)
0000:
<= Recv header, 26 bytes (0x1a)
0000: Server: akka-http/10.1.3
<= Recv header, 37 bytes (0x25)
0000: Date: Wed, 27 Jun 2018 15:20:24 GMT
<= Recv header, 28 bytes (0x1c)
0000: Transfer-Encoding: chunked
<= Recv header, 32 bytes (0x20)
0000: Content-Type: application/json
<= Recv header, 2 bytes (0x2)
0000:
== Info: Recv failure: Connection reset by peer
== Info: stopped the pause stream!
== Info: Closing connection 0
curl: (56) Recv failure: Connection reset by peer
The same request inspected in Wireshark:
screen shot
Reading logs didn't give any hint about probable source of the problem. Response logged as successful:
[27-06-2018 19:44:52.837][INFO] access: 'GET /api/1/tenant/ca764a91-8616-409c-8f08-c64a40d3fc07/ads' 200 596ms
Versions of used software:
Scala: 2.11.11
akka: 2.5.13
akka-http: 10.1.3
Configuration:
akka.conf
akka-http-core.conf
I tried increasing akka.http.host-connection-pool.max-connections to 128 but it didn't help. Maybe someone has an idea if this is a bug in akka-http or configuration problem?
If there is no I/O on your open connection for the idle-timeout, Akka will close the connection which often appears as a "connection reset by peer" error. Try increasing the akka.http.server.idle-timeout value.
Because your akka.http.server.request-timeout value is the same as akka.http.server.idle-timeout, it is a race condition between which timeout will occur first when there is no I/O. Sometimes, you will see a 503; other times, you will experience a connection reset error.

Using sed, delete everything between two characters

How can I delete symbols, whitespaces, characters, words everything between two characters in a line?
My 5-line file is:
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; InfoPath.1)" 120.94.30.12 264 556 -
"Skype for Macintosh" 120.94.30.9 1038 482 -
-129.94.30.4 217 309 -
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; InfoPath.1)" 120.94.30.8 1197 747 -
"¢¢HttpClient" 120.94.30.12 594 231 -
I want to delete everything comes in between " and " (including the " characters) so that the required output should be:
120.94.30.12 264 556 -
120.94.30.9 1038 482 -
-120.94.30.4 217 309 -
120.94.30.8 1197 747 -
120.94.30.12 594 231 -
You mean like this?
sed 's/"[^"]*"//' file
echo '"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; InfoPath.1)" 120.94.30.12 264 556 -' |\
sed -e 's/".*"\(.*\)/\1/g'