How can I run zeppelin with keberos in CDH 6.3.2

How can I run zeppelin with keberos in CDH 6.3.2 - kerberos

zeppelin 0.9.0 does not work with Kerberos
I have add "zeppelin.server.kerberos.keytab" and "zeppelin.server.kerberos.principal" in zeppelin-site.xml
But I aldo get error "Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: "bigdser5/10.3.87.27"; destination host is: "bigdser1":8020;"
And add "spark.yarn.keytab","spark.yarn.principal" in spark interpreters,it does not work yet.
In my spark-shell that can work with Kerberos
My kerberos step
1.admin.local -q "addprinc jzyc/hadoop"
kadmin.local -q "xst -k jzyc.keytab jzyc/hadoop#JJKK.COM"
copy jzyc.keytab to other server
kinit -kt jzyc.keytab jzyc/hadoop#JJKK.COM
In my livy I get error "javax.servlet.ServletException: org.apache.hadoop.security.authentication.client.AuthenticationException: javax.security.auth.login.LoginException: No key to store"

INFO [2021-04-15 16:44:46,522] ({dispatcher-event-loop-1} Logging.scala[logInfo]:57) - Got an error when resolving hostNames. Falling back to /default-rack for all
INFO [2021-04-15 16:44:46,561] ({FIFOScheduler-interpreter_1099886208-Worker-1} Logging.scala[logInfo]:57) - Attempting to login to KDC using principal: jzyc/bigdser4#JOIN.COM
INFO [2021-04-15 16:44:46,574] ({FIFOScheduler-interpreter_1099886208-Worker-1} Logging.scala[logInfo]:57) - Successfully logged into KDC.
INFO [2021-04-15 16:44:47,124] ({FIFOScheduler-interpreter_1099886208-Worker-1} Logging.scala[logInfo]:57) - getting token for: DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1346508100_40, ugi=jzyc/bigdser4#JOIN.COM (auth:KERBEROS)]] with renewer yarn/bigdser1#JOIN.COM
INFO [2021-04-15 16:44:47,265] ({FIFOScheduler-interpreter_1099886208-Worker-1} DFSClient.java[getDelegationToken]:700) - Created token for jzyc: HDFS_DELEGATION_TOKEN owner=jzyc/bigdser4#JOIN.COM, renewer=yarn, realUser=, issueDate=1618476287222, maxDate=1619081087222, sequenceNumber=171, masterKeyId=21 on ha-hdfs:nameservice1
INFO [2021-04-15 16:44:47,273] ({FIFOScheduler-interpreter_1099886208-Worker-1} Logging.scala[logInfo]:57) - getting token for: DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1346508100_40, ugi=jzyc/bigdser4#JOIN.COM (auth:KERBEROS)]] with renewer jzyc/bigdser4#JOIN.COM
INFO [2021-04-15 16:44:47,278] ({FIFOScheduler-interpreter_1099886208-Worker-1} DFSClient.java[getDelegationToken]:700) - Created token for jzyc: HDFS_DELEGATION_TOKEN owner=jzyc/bigdser4#JOIN.COM, renewer=jzyc, realUser=, issueDate=1618476287276, maxDate=1619081087276, sequenceNumber=172, masterKeyId=21 on ha-hdfs:nameservice1
INFO [2021-04-15 16:44:47,331] ({FIFOScheduler-interpreter_1099886208-Worker-1} Logging.scala[logInfo]:57) - Renewal interval is 86400051 for token HDFS_DELEGATION_TOKEN
INFO [2021-04-15 16:44:47,492] ({dispatcher-event-loop-0} Logging.scala[logInfo]:57) - Got an error when resolving hostNames. Falling back to /default-rack for all
INFO [2021-04-15 16:44:47,493] ({FIFOScheduler-interpreter_1099886208-Worker-1} Logging.scala[logInfo]:57) - Scheduling renewal in 18.0 h.
INFO [2021-04-15 16:44:47,494] ({FIFOScheduler-interpreter_1099886208-Worker-1} Logging.scala[logInfo]:57) - Updating delegation tokens.
INFO [2021-04-15 16:44:47,521] ({FIFOScheduler-interpreter_1099886208-Worker-1} Logging.scala[logInfo]:57) - Updating delegation tokens for current user.

INFO [2021-04-23 11:49:29,658] ({qtp1640639994-103} ManagedInterpreterGroup.java[getOrCreateSession]:180) - Create Session: shared_session in InterpreterGroup: md-shared_process for user: anonymous
INFO [2021-04-23 11:49:29,659] ({qtp1640639994-103} InterpreterSetting.java[getOrCreateInterpreterGroup]:453) - Create InterpreterGroup with groupId: spark-shared_process for ExecutionContext{user='anonymous', noteId='2EYUV26VR', interpreterGroupId='null', defaultInterpreterGroup='spark', inIsolatedMode=false, startTime=}
INFO [2021-04-23 11:49:29,659] ({qtp1640639994-103} InterpreterSetting.java[createInterpreters]:823) - Interpreter org.apache.zeppelin.spark.SparkInterpreter created for user: anonymous, sessionId: shared_session
but I enable shiro.ini

in spark.jars
you need hdfs://bigdser1:8020/sparklib/tispark-assembly-2.3.14.jar
not
hdfs://bigdser1:8020/sparklib/*

Related

Where to find spark log in dataproc when running job on cluster mode

I am running the following code as job in dataproc.
I could not find logs in console while running in 'cluster' mode.
import sys
import time
from datetime import datetime
from pyspark.sql import SparkSession
start_time = datetime.utcnow()
spark = SparkSession.builder.appName("check_confs").getOrCreate()
all_conf = spark.sparkContext.getConf().getAll()
print("\n\n=====\nExecuting at {}".format(datetime.utcnow()))
print(all_conf)
print("\n\n======================\n\n\n")
incoming_args = sys.argv
if len(incoming_args) > 1:
sleep_time = int(incoming_args[1])
print("Sleep time is {} seconds".format(sleep_time))
if sleep_time > 0:
time.sleep(sleep_time)
end_time = datetime.utcnow()
time_taken = (end_time - start_time).total_seconds()
print("Script execution completed in {} seconds".format(time_taken))
If I trigger the job using the deployMode as cluster property, I could not see corresponding logs.
But if the job is triggered in default mode which is client mode, able to see the respective logs.
I have given the dictionary used for triggering the job.
"spark.submit.deployMode": "cluster"
{
'placement': {
'cluster_name': dataproc_cluster
},
'pyspark_job': {
'main_python_file_uri': "gs://" + compute_storage + "/" + job_file,
'args': trigger_params,
"properties": {
"spark.submit.deployMode": "cluster",
"spark.executor.memory": "3155m",
"spark.scheduler.mode": "FAIR",
}
}
}
21/12/07 19:11:27 INFO org.sparkproject.jetty.util.log: Logging initialized #3350ms to org.sparkproject.jetty.util.log.Slf4jLog
21/12/07 19:11:27 INFO org.sparkproject.jetty.server.Server: jetty-9.4.40.v20210413; built: 2021-04-13T20:42:42.668Z; git: b881a572662e1943a14ae12e7e1207989f218b74; jvm 1.8.0_292-b10
21/12/07 19:11:27 INFO org.sparkproject.jetty.server.Server: Started #3467ms
21/12/07 19:11:27 INFO org.sparkproject.jetty.server.AbstractConnector: Started ServerConnector#18528bea{HTTP/1.1, (http/1.1)}{0.0.0.0:40389}
21/12/07 19:11:28 INFO org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at ******-m/0.0.0.5:8032
21/12/07 19:11:28 INFO org.apache.hadoop.yarn.client.AHSProxy: Connecting to Application History server at ******-m/0.0.0.5:10200
21/12/07 19:11:29 INFO org.apache.hadoop.conf.Configuration: resource-types.xml not found
21/12/07 19:11:29 INFO org.apache.hadoop.yarn.util.resource.ResourceUtils: Unable to find 'resource-types.xml'.
21/12/07 19:11:30 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl: Submitted application application_1638554180947_0014
21/12/07 19:11:31 INFO org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at ******-m/0.0.0.5:8030
21/12/07 19:11:33 INFO com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl: Ignoring exception of type GoogleJsonResponseException; verified object already exists with desired state.
=====
Executing at 2021-12-07 19:11:35.100277
[....... ('spark.yarn.historyServer.address', '****-m:18080'), ('spark.ui.proxyBase', '/proxy/application_1638554180947_0014'), ('spark.driver.appUIAddress', 'http://***-m.c.***-123456.internal:40389'), ('spark.sql.cbo.enabled', 'true')]
======================
Sleep time is 1 seconds
Script execution completed in 9.411261 seconds
21/12/07 19:11:36 INFO org.sparkproject.jetty.server.AbstractConnector: Stopped Spark#18528bea{HTTP/1.1, (http/1.1)}{0.0.0.0:0}
Logs not coming in console while running in client mode
21/12/07 19:09:04 INFO org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at ******-m/0.0.0.5:8032
21/12/07 19:09:04 INFO org.apache.hadoop.yarn.client.AHSProxy: Connecting to Application History server at ******-m/0.0.0.5:8032
21/12/07 19:09:05 INFO org.apache.hadoop.conf.Configuration: resource-types.xml not found
21/12/07 19:09:05 INFO org.apache.hadoop.yarn.util.resource.ResourceUtils: Unable to find 'resource-types.xml'.
21/12/07 19:09:06 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl: Submitted application application_1638554180947_0013

When running jobs in cluster mode, the driver logs are in the Cloud Logging yarn-userlogs. See the doc:
By default, Dataproc runs Spark jobs in client mode, and streams the driver output for viewing as explained, below. However, if the user creates the Dataproc cluster by setting cluster properties to --properties spark:spark.submit.deployMode=cluster or submits the job in cluster mode by setting job properties to --properties spark.submit.deployMode=cluster, driver output is listed in YARN userlogs, which can be accessed in Logging.

We can access the logs using query in Logs explorer in google cloud.
resource.type="cloud_dataproc_cluster" resource.labels.cluster_name="my_cluster_name"
resource.labels.cluster_uuid="aaaaa-123435-bbbbbb-ccccc"
severity=DEFAULT
jsonPayload.container_logname="stdout"
jsonPayload.message!=""
log_name="projects/my-project_id/logs/yarn-userlogs"

Failed to establish connection to Neo4j usign bolt scheme even after successfully enabling Bolt

I want to connect to Neo4j database using my creds. I am tunneling into a machine and once that is done, I open my broswer at the port: localhost:7474.
I tried with both neo4j and bolt scheme to connect at the url:
bolt://<node_ip>:7687 and neo4j://<node_ip>:7687 but the connection times out.
I tried checking the logs but only found that bolt scheme is enabled:
bash-4.2$ tail -f /logs/debug.log
2021-07-02 21:26:03.323+0000 WARN [o.n.k.a.p.GlobalProcedures] Failed to load `org.apache.commons.logging.impl.LogKitLogger` from plugin jar `/home/sandbox/neo/plugins/apoc-4.2.0.2-all.jar`: org/apache/log/Logger
2021-07-02 21:26:03.946+0000 INFO [c.n.m.g.GlobalMetricsExtension] Sending metrics to CSV file at /home/sandbox/neo/metrics
2021-07-02 21:26:03.973+0000 INFO [o.n.b.BoltServer] Bolt enabled on 0.0.0.0:7687.
2021-07-02 21:26:03.974+0000 INFO [o.n.b.BoltServer] Bolt (Routing) enabled on 0.0.0.0:7688.
2021-07-02 21:26:03.974+0000 INFO [o.n.s.AbstractNeoWebServer$ServerComponentsLifecycleAdapter] Starting web server
2021-07-02 21:26:04.001+0000 INFO [o.n.s.m.ThirdPartyJAXRSModule] Mounted unmanaged extension [n10s.endpoint] at [/rdf]
2021-07-02 21:26:05.341+0000 INFO [c.n.s.e.EnterpriseNeoWebServer] Remote interface available at http://<node_ip>:7474/
2021-07-02 21:26:05.341+0000 INFO [o.n.s.AbstractNeoWebServer$ServerComponentsLifecycleAdapter] Web server started.
2021-07-02 21:35:34.565+0000 INFO [c.n.c.c.c.l.s.Segments] [system/00000000] Pruning SegmentFile{path=raft.log.0, header=SegmentHeader{formatVersion=2, recordOffset=56, prevFileLastIndex=-1, segmentNumber=0, prevIndex=-1, prevTerm=-1}}
2021-07-02 21:35:46.079+0000 INFO [c.n.c.c.c.l.s.Segments] [neo4j/32f6599b] Pruning SegmentFile{path=raft.log.0, header=SegmentHeader{formatVersion=2, recordOffset=56, prevFileLastIndex=-1, segmentNumber=0, prevIndex=-1, prevTerm=-1}}
The query log is empty, as I could not execute any query:
bash-4.2$ tail -f query.log
2021-07-02 21:25:52.510+0000 INFO Query started: id:1 - 1009 ms: 0 B - embedded-session neo4j - - call db.clearQueryCaches() - {} - runtime=pipelined - {}
2021-07-02 21:25:52.580+0000 INFO id:1 - 1080 ms: 112 B - embedded-session neo4j - - call db.clearQueryCaches() - {} - runtime=pipelined - {}
The other articles or answers that I read were mostly about misconfiguration: wrong ports but I don't think that is the case with me since I checked from debug.log file that my ports are alright.
FWIW, I am using 3 replicas for my Neo4j and right now, connecting to just one pod.
I am tunnelling both the ports:
ssh -L 7687:$IP:7687 -L 7474:$IP:7474 domain_name.com -N```

Perhaps you've already checked this, but if not, can you ensure that port 7687 is also forwarded. When I tunnelled via browser, my expectation was that 7474 would be sufficient, but it turned out that forwarding 7687 is also necessary.

So, instead of providing localhost in the connection string, I made a silly mistake of writing down the actual IP and that was the reason for connection timeout.

rundeck :how to correct the configuration of rundeck to access via the browser

i have problem accessing rundeck
[2021-05-03T17:33:33,231] WARN beans.GenericTypeAwarePropertyDescriptor - Invalid JavaBean property 'exceptionMappings' being accessed! Ambiguous write methods found next to actually used [public void grails.plugin.springsecurity.web.authentication.AjaxAwareAuthenticationFailureHandler.setExceptionMappings(java.util.List)]: [public void org.springframework.security.web.authentication.ExceptionMappingAuthenticationFailureHandler.setExceptionMappings(java.util.Map)]
[2021-05-03T17:33:41,756] INFO rundeckapp.BootStrap - Starting Rundeck 3.3.10-20210301 (2021-03-02) ...
[2021-05-03T17:33:41,757] INFO rundeckapp.BootStrap - using rdeck.base config property: /var/lib/rundeck
[2021-05-03T17:33:41,768] INFO rundeckapp.BootStrap - loaded configuration: /etc/rundeck/framework.properties
[2021-05-03T17:33:41,805] INFO rundeckapp.BootStrap - RSS feeds disabled
[2021-05-03T17:33:41,806] INFO rundeckapp.BootStrap - Using jaas authentication
[2021-05-03T17:33:41,811] INFO rundeckapp.BootStrap - Preauthentication is disabled
[2021-05-03T17:33:41,918] INFO rundeckapp.BootStrap - Rundeck is ACTIVE: executions can be run.
[2021-05-03T17:33:42,283] WARN rundeckapp.BootStrap - [Development Mode] Usage of H2 database is recommended only for development and testing
[2021-05-03T17:33:42,590] INFO rundeckapp.BootStrap - Rundeck startup finished in 945ms
[2021-05-03T17:33:42,877] INFO rundeckapp.Application - Started Application in 32.801 seconds (JVM running for 35.608)
Grails application running at http://xxx.xxx.xxx.xxx:4440 in environment: production
Session terminated, killing shell...[2021-05-04T10:20:46,596] INFO rundeckapp.BootStrap - Rundeck Shutdown detected
...killed.
can you help me please

by the way I have installed a vm under redhat
then I installed rundeck RPM
and from my physical machine when I do http: // rundecknode_ip: 4440
it returns me on the browser error 113 no route to host and on examination of the logs I have what I have posted above
when i do systemctl status rundeck it is active running

Kerberos authentication (GSSAPI) in Apache Kafka 6.0 uses Pre windows 2000 names format

I trying to make Apache Kafka protected using SASL_SSL and GSSAPI mecanism. Everythnig is workin properly apart from the fact that the Authentication names used by Kafka are the "pre-windows 2000" formatted names instead of the "standard" new ones.
For instance, I declare a new kafka broker in our Active Directory (I forgot to say that it's a Windows 10 version...):
User logon Name: kafka/kafka1.myfqdn.com#MYFQDN.COM
User logon name (pre-Windows 2000): FAKE_USER1
When I login into kafka using this user keytab, I have this into the logs:
[2020-11-21 17:05:50,168] INFO Successfully authenticated client: authenticationID=FAKE_USER1#MYFQDN.COM; authorizationID=kafka/kafka1.myfqdn.com#MYFQDN.COM. (org.apache.kafka.common.security.authenticator.SaslServerCallbackHandler)
[2020-11-21 17:09:50,909] INFO [GroupMetadataManager brokerId=1] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2020-11-21 17:12:00,672] INFO Successfully authenticated client: authenticationID=FAKE_USER1#MYFQDN.COM; authorizationID=kafka/kafka1.myfqdn.com#MYFQDN.COM. (org.apache.kafka.common.security.authenticator.SaslServerCallbackHandler)
[2020-11-21 17:12:00,772] INFO Successfully authenticated client: authenticationID=FAKE_USER1#MYFQDN.COM; authorizationID=kafka/kafka1.myfqdn.com#MYFQDN.COM. (org.apache.kafka.common.security.authenticator.SaslServerCallbackHandler)
[2020-11-21 17:12:00,799] DEBUG No acl found for resource ResourcePattern(resourceType=CLUSTER, name=kafka-cluster, patternType=LITERAL), authorized = false (kafka.authorizer.logger)
[2020-11-21 17:12:00,799] INFO Principal = User:FAKE_USER1 is Denied Operation = DescribeConfigs from host = xxx.xxx.xxx.xxx on resource = Cluster:LITERAL:kafka-cluster for request = DescribeConfigs with resourceRefCount = 1 (kafka.authorizer.logger)
Of course, the Denied at the end is normal because my rules expect to extract "kafka" from the kafka/kafka1.myfqdn.com#MYFQDN.COM user.
Could you tell me what I do not do properly ?

Setting Up HLF network V1.4 with tls enabled and kafka based ordering

I am creating an HLF v1.4 network with TLS enabled and Kafka based ordering, But when I am trying to create a channel it throws an error saying
and when I saw the logs of orderer it is showing
Configs for TLS in network
Peer Configs
- CORE_PEER_TLS_ENABLED=true
- CORE_PEER_GOSSIP_SKIPHANDSHAKE=true
- CORE_PEER_TLS_CERT_FILE=/etc/hyperledger/crypto/peer/tls/server.crt
- CORE_PEER_TLS_KEY_FILE=/etc/hyperledger/crypto/peer/tls/server.key
- CORE_PEER_TLS_ROOTCERT_FILE=/etc/hyperledger/crypto/peer/tls/ca.crt
Orderer Configs
# enabled TLS
- ORDERER_GENERAL_TLS_ENABLED=true
- ORDERER_GENERAL_TLS_PRIVATEKEY=/etc/hyperledger/crypto/orderer/tls/server.key
- ORDERER_GENERAL_TLS_CERTIFICATE=/etc/hyperledger/crypto/orderer/tls/server.crt
- ORDERER_GENERAL_TLS_ROOTCAS=[/etc/hyperledger/crypto/orderer/tls/ca.crt, /etc/hyperledger/crypto/peer/tls/ca.crt]
Cli Configs
- CORE_PEER_TLS_ENABLED=true
- CORE_PEER_TLS_CERT_FILE=/etc/hyperledger/peer/peers/peer0.org1/tls/server.crt
- CORE_PEER_TLS_KEY_FILE=/etc/hyperledger/peer/peers/peer0.org1/tls/server.key
- CORE_PEER_TLS_ROOTCERT_FILE=/etc/hyperledger/peer/peers/peer0.org1/tls/ca.crt
Can anyone help me in this regard

as the error says, bad certificate while creating a channel, orderer certificate is not found, that's why the error bad certificate.
In the compose.yaml file, set the environment variable
FABRIC_LOGGING_SPEC=DEBUG, to see exactly what the error is.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How can I run zeppelin with keberos in CDH 6.3.2 - kerberos

in spark.jars you need hdfs://bigdser1:8020/sparklib/tispark-assembly-2.3.14.jar not hdfs://bigdser1:8020/sparklib/*

Related

Where to find spark log in dataproc when running job on cluster mode

Failed to establish connection to Neo4j usign bolt scheme even after successfully enabling Bolt

rundeck :how to correct the configuration of rundeck to access via the browser

Kerberos authentication (GSSAPI) in Apache Kafka 6.0 uses Pre windows 2000 names format

Setting Up HLF network V1.4 with tls enabled and kafka based ordering

Categories

Resources