pymongo.errors.ConnectionFailure: timed out from an ubuntu ec2 instance running scrapyd - mongodb

So... I'm running scrapyd on my ubuntu ec2 instance after following this post: http://www.dataisbeautiful.io/deploying-scrapy-ec2/
however I guess I can't get pymongo to connect to my MongoLabs mongo database, since the ubuntu ec2 scrapyd logs are saying
pymongo.errors.ConnectionFailure: timed out
I'm a real noob when it comes to back end stuff, so I don't really have any idea what could be causing this issue. When I run my scrapyd from localhost, it works totally fine, and saves the scraped data to my MongoLabs db. For my scrapyd running on the ec2 instance, I can access the scrapyd gui by typing in the ec2 address at port 6800 (equivalent to scrapyd's localhost:6800), but that's about it. Curling
curl http://aws-ec2-link:6800/schedule.json -d project=sportslab_scrape -d spider=max -d max_url="http://www.maxpreps.com/high-schools/de-la-salle-spartans-(concord,ca)/football/stats.htm"
gives back 'status': 'okay' and I can see the job appear, but no items are produced and the log only shows
2014-11-17 02:20:13+0000 [scrapy] INFO: Scrapy 0.24.4 started (bot: sportslab_scrape_outer)
2014-11-17 02:20:13+0000 [scrapy] INFO: Optional features available: ssl, http11
2014-11-17 02:20:13+0000 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'sportslab_scrape.spiders', 'SPIDER_MODULES': ['sportslab_scrape.spiders'], 'FEED_URI': 'items/sportslab_scrape/max/4299afa26e0011e4a543060f585a893f.jl', 'LOG_FILE': 'logs/sportslab_scrape/max/4299afa26e0011e4a543060f585a893f.log', 'BOT_NAME': 'sportslab_scrape_outer'}
2014-11-17 02:20:13+0000 [scrapy] INFO: Enabled extensions: FeedExporter, LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2014-11-17 02:20:13+0000 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2014-11-17 02:20:13+0000 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
Anyone got some helpful insights for my issue? Thanks!
edit: Connection code added.
Settings.py
MONGODB_HOST = 'mongodb://user:pass#asdf.mongolab.com:38839/sportslab_mongodb'
MONGODB_PORT = 38839 # Change in prod
MONGODB_DATABASE = "sportslab_mongodb" # Change in prod
MONGODB_COLLECTION = "sportslab"
Scrapy's Pipeline.py
from pymongo import Connection
from scrapy.conf import settings
class MongoDBPipeline(object):
def __init__(self):
connection = Connection(settings['MONGODB_HOST'], settings['MONGODB_PORT'])
db = connection[settings['MONGODB_DATABASE']]
self.collection = db[settings['MONGODB_COLLECTION']]
def process_item(self, item, spider):
self.collection.insert(dict(item))
return item

I solved the issue. Initially, I set up my ec2's security group's outbound rules as:
Outbound
Type:HTTP, Protocol: TCP, Port Range:80, Destination: 0.0.0.0/0
Type:Custom, Protocol: TCP, Port Range: 6800, Destination: 0.0.0.0/0
Type:HTTPS, Protocol: TCP, Port Range:443, Destination 0.0.0.0/0
However, this wasn't enough as I also needed a specific Custom TCP Protocol for the actual port of the mongolab db I was connecting to, which should look like this...
Type:Custom, Protocol: TCP, Port Range: 38839, Destination: 0.0.0.0/0

Related

JMX Connection refused on Kubernetes with AdoptOpenJDK OpenJ9

With my team we are trying to move our micro-services to openj9, they are running on kubernetes. However, we encounter a problem on the configuration of JMX. (openjdk8-openj9)
We have a connection refused when we try a connection with jvisualvm (and a port-forwarding with Kubernetes).
We haven't changed our configuration, except for switching from Hotspot to OpenJ9.
The error :
E0312 17:09:46.286374 17160 portforward.go:400] an error occurred forwarding 1099 -> 1099: error forwarding port 1099 to pod XXXXXXX, uid : exit status 1: 2020/03/12 16:09:45 socat[31284] E connect(5, AF=2 127.0.0.1:1099, 16): Connection refused
The java options that we use :
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.local.only=false
-Dcom.sun.management.jmxremote.port=1099
-Dcom.sun.management.jmxremote.rmi.port=1099
We are using the last adoptopenjdk/openjdk8-openj9 docker image.
Do you have any ideas?
Thank you !
Regards.
I managed to figure out why it wasn't working.
It turns out that to pass the JMX options to the service we were using the Kubernetes service descriptor in YAML. It looks like this:
- name: _JAVA_OPTIONS
value: -Dzipkinserver.listOfServers=http://zipkin:9411 -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.port=1099 -Dcom.sun.management.jmxremote.rmi.port=1099
I realized that the JMX properties were not taken into account from _JAVA_OPTIONS when the application is not launch with ENTRYPOINT in the docker container.
So I pass the properties directly into the Dockerfile like this and it works.
CMD ["java", "-Dcom.sun.management.jmxremote", "-Dcom.sun.management.jmxremote.authenticate=false", "-Dcom.sun.management.jmxremote.ssl=false", "-Dcom.sun.management.jmxremote.local.only=false", "-Dcom.sun.management.jmxremote.port=1099", "-Dcom.sun.management.jmxremote.rmi.port=1099", "-Djava.rmi.server.hostname=127.0.0.1", "-cp","app:app/lib/*","OurMainClass"]
It's also possible to keep _JAVA_OPTIONS and setup an ENTRYPOINT in the dockerfile.
Thanks!

Received AliveMessage from a peer with the same PKI-ID as myself

I am attempting to port the Hyperledger Fabric Getting Started to Kubernetes. But am struggling to get peer1's to deploy. If I enable CORE_PEER_GOSSIP_BOOTSTRAP, I receive errors "Received AliveMessage from a peer with the same PKI-ID as myself".
How can I debug a peer reportedly having the same PKI-ID as another?
Using this as a starting point:
https://hyperledger-fabric.readthedocs.io/en/latest/getting_started.html
I am able to create:
orderer and cli pods in default namespace
peer0's one in each org1|org2 namespace.
peer1's but only if I disable (comment out) CORE_PEER_GOSSIP_BOOTSTRAP
If I enable CORE_PEER_GOSSIP_BOOTSTRAP for the peer1's, I receive the following warning and error:
[gossip/gossip#10.0.0.10:7051] NewGossipService -> WARN 01c External endpoint is empty, peer will not be accessible outside of its organization
...
[gossip/discovery#10.0.0.10:7051] handleAliveMessage -> ERRO 02a Bad configuration detected: Received AliveMessage from a peer with the same PKI-ID as myself: tag:EMPTY alive_msg:<membership:<pki_id:"[[REDACTED]]" > timestamp:<inc_number:1495468533769417608 seq_num:416 > >
In order to better map the Orderer, Peers to DNS names, I'm using Kubernetes Namespaces and this configuration:
OrdererOrgs:
- Name: Orderer
Domain: default.svc.cluster.local
Specs:
- Hostname: orderer
PeerOrgs:
- Name: Org1
Domain: org1.svc.cluster.local
Template:
Count: 2
Users:
Count: 2
- Name: Org2
Domain: org2.svc.cluster.local
Template:
Count: 2
Users:
Count: 2
In order to expose the peer0's to the other peers in the org and to expose the orderer, I have ClusterIP services for the peer0's (selecting only the peer0's) and orderer. It's inelegant but I'm trying to get it to work before I get it working more beautifully.
I am able to resolve orderer.default.svc.cluster.local, peer0.org1.svc.cluster.local, `peer0.org2.svc.cluster.local' using nslookup from within a pod deployed to default on the cluster.
Absent a curl-like tool for gPRC, I am able to open sockets against these endpoints on 7051 and 7053.
First, make sure you are using the right certificates.
Second, verify that your environment/configuration for gossip is set correctly
environment:
- CORE_PEER_GOSSIP_EXTERNALENDPOINT=peer1.org1.example.com:8051
- CORE_PEER_GOSSIP_BOOTSTRAP=peer0.org1.example.com:7051
- CORE_PEER_GOSSIP_ENDPOINT=peer0.org1.example.com:7051
OR in core.yaml
peer:
gossip:
bootstrap: peer0.org1.example.com:7051
externalEndpoint: peer1.org1.example.com:8051
endpoint: peer0.org1.example.com:7051
Edited: Also make sure that you have properly setup your CA
Hope this helps, it worked for me. And I was successfully able to connect peers.
If the peers are started from the same node, its possible that you are mounting the same crypto-material (path to mspconfig directory) for both the peers. If that is the case, separate the directory structures for both the peers and keep their respective certificates in them, update the respective paths for msp in docker-compose file and try to run.

Spring Cloud - Registry Service port customization

I'd like to customize the Eureka port with Spring Cloud.
With the default port below, the services registry sees itself right (within the provided GUI)
spring:
application:
name: services-registry
server:
port: 8761
eureka:
instance:
hostname: localhost
nonSecurePort: ${server.port}
client:
register-with-eureka: true
fetch-registry: false
service-url:
default-zone: http://${eureka.instance.hostname}:${server.port}/eureka/
But if I just change server.port to 8787, no service can register itself, not even the services registry itself.
2017-01-09 16:18:21.584 WARN 17496 --- [nfoReplicator-0] c.n.d.s.t.d.RetryableEurekaHttpClient : Request execution failure
2017-01-09 16:18:21.584 WARN 17496 --- [nfoReplicator-0] com.netflix.discovery.DiscoveryClient : DiscoveryClient_SERVICES-REGISTRY/xxx.org:services-registry:8787 - registration failed Cannot execute request on any known server
com.netflix.discovery.shared.transport.TransportException: Cannot execute request on any known server
...
2017-01-09 16:13:33.299 WARN 17496 --- [nfoReplicator-0] c.n.discovery.InstanceInfoReplicator : There was a problem with the instance info replicator
com.netflix.discovery.shared.transport.TransportException: Cannot execute request on any known server
Can someone explain this issue and save my day? Thanks!
Ok, got it... the label after service-url property (which can be aliased as serviceUrl in YML) is a HashMap KEY, not a property label. So it has to be kept as a Camel Case tag in any ways!
eureka.client.service-url.defaultZone=http://[myIP#]:8787/eureka

Datastax Cassandra Driver always attempts to connect to localhost, even though it's not configured to do so

So I have the following Client code:
def getCluster:Session = {
import collection.JavaConversions._
val endpoints = config.getStringList("cassandra.server")
val keyspace = config.getString("cassandra.keyspace")
val clusterBuilder = Cluster.builder
endpoints.toTraversable.map { x =>
clusterBuilder.addContactPoint(x)
}
val cluster = clusterBuilder.build
cluster
.getConfiguration
.getProtocolOptions
.setCompression(ProtocolOptions.Compression.LZ4)
cluster.connect(keyspace)}
which is shamelessly borrowed from the examples in datastax's driver documentation.
When I attempt to execute code with it, it always tries to connect to localhost, even though it's not configured for that...
In some cases, it will connect (basic reads) but for writes I get the following log message:
2016-07-07 11:34:31 DEBUG Connection:157 - Connection[/127.0.0.1:9042-10, inFlight=0, closed=false] Error connecting to /127.0.0.1:9042 (Connection refused: /127.0.0.1:9042)
2016-07-07 11:34:31 DEBUG STATES:404 - Defuncting Connection[/127.0.0.1:9042-10, inFlight=0, closed=false] because: [/127.0.0.1] Cannot connect
2016-07-07 11:34:31 DEBUG STATES:108 - [/127.0.0.1:9042] Connection[/127.0.0.1:9042-10, inFlight=0, closed=false] failed, remaining = 0
2016-07-07 11:34:31 DEBUG Connection:629 - Connection[/127.0.0.1:9042-10, inFlight=0, closed=true] closing connection
2016-07-07 11:34:31 DEBUG Cluster:1802 - Aborting onDown because a reconnection is running on DOWN host /127.0.0.1:9042
2016-07-07 11:34:31 DEBUG Cluster:1872 - Failed reconnection to /127.0.0.1:9042 ([/127.0.0.1] Cannot connect), scheduling retry in 512000 milliseconds
2016-07-07 11:34:31 DEBUG STATES:196 - [/127.0.0.1:9042] next reconnection attempt in 512000 ms
I can't figure out where/what I need to configure on the driver side (no local client, it's just the driver) to correct this issue
My guess is that this is caused by configuration of the cassandra.yaml file on your cassandra node(s). The two main settings that would impact this are broadcast_rpc_address and rpc_address, from The cassandra.yaml configuration reference:
broadcast_rpc_address
(Default: unset) RPC address to broadcast to drivers and other Cassandra nodes. This cannot be set to 0.0.0.0. If blank, it is set to the value of the rpc_address or rpc_interface. If rpc_address or rpc_interfaceis set to 0.0.0.0, this property must be set.
rpc_address
(Default: localhost) The listen address for client connections (Thrift RPC service and native transport).
If you leave both of these to the defaults, localhost will be the default address cassandra will communicate to connect on.
After the driver is able to connect to a contact point, it queries the system.local and system.peers table of the contact point to determine which hosts to connect to, the addresses those tables communicate are from rpc_address/broadcast_rpc_address

The Datastax cassandra community server 2.1.10 service on local computer started and then stopped

I am trying to configure a two node cluster with cassandra in windows r2 2008
So i installed cassandra community version in one server (10.xxx.0.1,10.xxx.0.2)
And then I stopped the service and then edited the configuraton.yaml file in the conf folder.
The changes are:
cluster_name
commented the num_tokens
gave the tokens in initial_token,
seeds as 10.xxx.0.1,10.xxx.0.2,
listen_addresses are their respective ip addresses which are 10.xxx.0.1,10.xxx.0.2,
rpc_addresses as 0.0.0.0,
endpointsnitch as gossip
I also changed the cassandra rackdc.properties file to dc=DC1 rack=RAC1.
I then saved and started back the service and opened the cqlsh, but it is not connecting. Below is the error:
2015-10-12 16:20:13 Commons Daemon procrun stderr initialized
If rpc_address is set to a wildcard address (0.0.0.0), then you must set broadcast_rpc_address to a value other than 0.0.0.0
Fatal configuration error; unable to start. See log for stacktrace.
..
ERROR 21:20:14 Fatal configuration error
org.apache.cassandra.exceptions.ConfigurationException: If rpc_address is set to a wildcard address (0.0.0.0), then you must set broadcast_rpc_address to a value other than 0.0.0.0
at org.apache.cassandra.config.DatabaseDescriptor.applyAddressConfig(DatabaseDescriptor.java:285) ~[apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.config.DatabaseDescriptor.applyConfig(DatabaseDescriptor.java:443) ~[apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.config.DatabaseDescriptor.<clinit>(DatabaseDescriptor.java:136) ~[apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:168) [apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:562) [apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:651) [apache-cassandra-2.1.10.jar:2.1.10]
If you out 0.0.0.0 to the rpc_address you have to change the broadcast_rpc_address like in http://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configCassandra_yaml_r.html , I think that the right broadcast_rpc_address can be the own ip address.