JupyterHub - log current user - jupyter

I use a custom logger to log who is currently doing any kind of stuff in Jupyterhub.
logging_config: dict = {
"version": 1,
"disable_existing_loggers": False,
"formatters": {
"company": {
"()": lambda: MyFormatter(user=os.environ.get("JUPYTERHUB_USER", "Unknown"))
},
},
....
c.Application.logging_config = logging_config
Output:
{"asctime": "2022-06-29 14:13:43,773", "level": "WARNING", "name": "JupyterHub", "message": "Updating Hub route http://127.0.0.1:8081 \u2192 http://jupyterhub:8081", "user": "Unknown"
The logger itself works fine, but I am not able to log who was performing the action. In the Image I start, there is a JUPYTERHUB_USER env variable available. This seems to get passed from JupyterHub ( I don´t know how this is done exactly). But in JupyterHub I don´t have this variable available.
Is there a way to use it in JupyterHub, not just in the jupyterLab container?

This doesn't get you all the way there but it's a start - we add extra pod annotations/labels through KubeSpawner's extra_annotations using the cluster_options hook (see our helm chart for our complete daskhub setup):
dask-gateway:
gateway:
extraConfig:
optionHandler: |
from dask_gateway_server.options import Options, String, Select, Mapping, Float, Bool
from math import ceil
def cluster_options(user):
def option_handler(options):
extra_annotations = {
"hub.jupyter.org/username": user.name
}
default_extra_labels = {
"hub.jupyter.org/username": user.name,
}
return Options(
Select(
...
),
...,
handler=option_handler,
)
c.Backend.cluster_options = cluster_options
You can then poll pods with these labels to get real time usage. There may be a more direct way to do this though - not sure.

Related

Proper way to test Chalice app with env variables

I have a Chalice API project with the following config under .chalice/config.json:
{
"version": "2.0",
"app_name": "my-app",
"manage_iam_role": true,
"stages": {
"local": {
"api_gateway_stage": "local",
"environment_variables": {
"STAGE_NAME": "local",
"MY_ENV_VAR": "value-local"
}
},
"staging": {
"api_gateway_stage": "stg",
"environment_variables": {
"STAGE_NAME": "staging",
"MY_ENV_VAR": "value-staging"
}
},
"production": {
"api_gateway_stage": "prod",
"environment_variables": {
"STAGE_NAME": "production",
"MY_ENV_VAR": "value-prod"
}
}
}
}
The app code uses some env variables like MY_ENV_VAR, loaded with os.environ["MY_ENV_VAR"].
The tests look like the following (let's say in a tests/test_my_endpoint.py file):
from http import HTTPStatus
from app import app
from chalice.test import Client
def test_my_endpoint() -> None:
with Client(app, stage_name="local") as client:
response = client.http.get("/my-endpoint")
assert response.status_code == HTTPStatus.OK
But it seems stage_name="local" doesn't mean variables for the stage local are loaded for the tests, so during tests execution, env variables are not found and tests fail. I'm not sure how setting this argument for chalice.test.Client is useful then.
For testing locally, I can always load environment variables locally, but for CI/CD, I would like the tests to use the env variable of a stage defined in .chalice/config.json. Chalice documentation isn't very clear about the good practice here, and about how stage_name argument works exactly.
How can I fix that, and generally speaking what is the proper way to create tests for a Chalice app using the env variables?

CannotPullContainerError: failed to extract layer

I'm trying to run a task on a windows container in fargate mode on aws
The container is a .net console application (Fullframework 4.5)
This is the task definition generated programmatically by SDK
var taskResponse = await ecsClient.RegisterTaskDefinitionAsync(new Amazon.ECS.Model.RegisterTaskDefinitionRequest()
{
RequiresCompatibilities = new List<string>() { "FARGATE" },
TaskRoleArn = TASK_ROLE_ARN,
ExecutionRoleArn = EXECUTION_ROLE_ARN,
Cpu = CONTAINER_CPU.ToString(),
Memory = CONTAINER_MEMORY.ToString(),
NetworkMode = NetworkMode.Awsvpc,
Family = "netfullframework45consoleapp-task-definition",
EphemeralStorage = new EphemeralStorage() { SizeInGiB = EPHEMERAL_STORAGE_SIZE_GIB },
ContainerDefinitions = new List<Amazon.ECS.Model.ContainerDefinition>()
{
new Amazon.ECS.Model.ContainerDefinition()
{
Name = "netfullframework45consoleapp-task-definition",
Image = "XXXXXXXXXX.dkr.ecr.eu-west-1.amazonaws.com/netfullframework45consoleapp:latest",
Cpu = CONTAINER_CPU,
Memory = CONTAINER_MEMORY,
Essential = true
//I REMOVED THE LOG DEFINITION TO SIMPLIFY THE PROBLEM
//,
//LogConfiguration = new Amazon.ECS.Model.LogConfiguration()
//{
// LogDriver = LogDriver.Awslogs,
// Options = new Dictionary<string, string>()
// {
// { "awslogs-create-group", "true"},
// { "awslogs-group", $"/ecs/{TASK_DEFINITION_NAME}" },
// { "awslogs-region", AWS_REGION },
// { "awslogs-stream-prefix", $"{TASK_DEFINITION_NAME}" }
// }
//}
}
}
});
these are the role policies contained used by the task AmazonECSTaskExecutionRolePolicy
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ecr:GetAuthorizationToken",
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage",
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource": "*"
}
]
}
i got this error when lunch the task
CannotPullContainerError: ref pull has been retried 1 time(s): failed to extract layer sha256:fe48cee89971abac42eedb9110b61867659df00fc5b0b90dd91d6e19f704d935: link /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/212/fs/Files/ProgramData/Microsoft/Event Viewer/Views/ServerRoles/RemoteDesktop.Events.xml /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/212/fs/Files/Windows/Microsoft.NET/assembly/GAC_64/Microsoft.Windows.ServerManager.RDSPlugin/v4.0_10.0.0.0__31bf3856ad364e35/RemoteDesktop.Events.xml: no such file or directory: unknown
some search drived me here:
https://aws.amazon.com/it/premiumsupport/knowledge-center/ecs-pull-container-api-error-ecr/
the point 1 says that if i run the task on the private subnet (like i'm doing) i need a NAT with related route to garantee the communication towards the ECR, but
note that in my infrastructure i've a VPC Endpoint to the ECR....
so the first question is: is a VPC Endpoint sufficent to garantee the comunication from the container to the container images registry(ECR)? or i need necessarily to implement what the point 1 say (NAT and route on the route table) or eventually run the task on a public subnet?
Can be the error related to the missing communication towards the ECR, or could be a missing policy problem?
Make sure your VPC endpoint is configured correctly. Note that
"Amazon ECS tasks hosted on Fargate using platform version 1.4.0 or later require both the com.amazonaws.region.ecr.dkr and com.amazonaws.region.ecr.api Amazon ECR VPC endpoints as well as the Amazon S3 gateway endpoint to take advantage of this feature."
See https://docs.aws.amazon.com/AmazonECR/latest/userguide/vpc-endpoints.html for more information
In the first paragraph of the page I linked: "You don't need an internet gateway, a NAT device, or a virtual private gateway."

Isolation between kubernetes.admission policies in OPA

I use the vanilla Open Policy Agent as a deployment on Kubernetes for handling admission webhooks.
The behavior of multiple policies evaluation is not clear to me, see this example:
## policy-1.rego
package kubernetes.admission
check_namespace {
# evaluate to true
namespaces := {"namespace1"}
namespaces[input.request.namespace]
}
check_user {
# evaluate to false
users := {"user1"}
users[input.request.userInfo.username]
}
allow["yes - user1 and namespace1"] {
check_namespace
check_user
}
.
## policy-2.rego
package kubernetes.admission
check_namespace {
# evaluate to false
namespaces := {"namespace2"}
namespaces[input.request.namespace]
}
check_user {
# evaluate to true
users := {"user2"}
users[input.request.userInfo.username]
}
allow["yes - user2 and namespace12] {
check_namespace
check_user
}
.
## main.rego
package system
import data.kubernetes.admission
main = {
"apiVersion": "admission.k8s.io/v1",
"kind": "AdmissionReview",
"response": response,
}
default uid = ""
uid = input.request.uid
response = {
"allowed": true,
"uid": uid,
} {
reason = concat(", ", admission.allow)
reason != ""
}
else = {"allowed": false, "uid": uid}
.
## example input
{
"apiVersion": "admission.k8s.io/v1beta1",
"kind": "AdmissionReview",
"request": {
"namespace": "namespace1",
"userInfo": {
"username": "user2"
}
}
}
.
## Results
"allow": [
"yes - user1 and namespace1",
"yes - user2 and namespace2"
]
It seems that all of my policies are being evaluated as just one flat file, but i would expect that each policy will be evaluated independently from the others
What am I missing here?
Files don't really mean anything to OPA, but packages do. Since both of your policies are defined in the kubernetes.admission module, they'll essentially be appended together as one. This works in your case only due to one of the check_user and check_namespace respectively evaluating to undefined given your input. If they hadn't you would see an error message about conflict, since complete rules can't evalutate to different results (i.e. allow can't be both true and false).
If you rather use a separate package per policy, like, say kubernetes.admission.policy1 and kubernetes.admission.policy2, this would not be a concern. You'd need to update your main policy to collect an aggregate of the allow rules from all of your policies though. Something like:
reason = concat(", ", [a | a := data.kubernetes.admission[policy].allow[_]])
This would iterate over all the sub-packages in kubernetes.admission and collect the allow rule result from each. This pattern is called dynamic policy composition, and I wrote a longer text on the topic here.
(As a side note, you probably want to aggregate deny rules rather than allow. As far as I know, clients like kubectl won't print out the reason from the response unless it's actually denied... and it's generally less useful to know why something succeeded rather than failed. You'll still have the OPA decision logs to consult if you want to know more about why a request succeeded or failed later).

Not able to retrieve RedShift cluster Capacity details like Storage, Memory using Python script

I have tried to fetch my RedShift cluster details. I'm able to see many details about the cluster but few details got missed.
For Ex:- Details like Storageand Memory
The below is the code:-
redshiftClient = boto3.client('redshift', aws_access_key_id = role.credentials.access_key,
aws_secret_access_key = role.credentials.secret_key, aws_session_token = role.credentials.session_token, region_name='us-west-2')
#Getting all the clusters
clusters = redshiftClient.describe_clusters()
can you please check provide the way to get it.
Thanks.
The describe-clusters command does not return that type of information. The output of that command is:
{
"Clusters": [
{
"NodeType": "dw.hs1.xlarge",
"Endpoint": {
"Port": 5439,
"Address": "mycluster.coqoarplqhsn.us-east-1.redshift.amazonaws.com"
},
"ClusterVersion": "1.0",
"PubliclyAccessible": "true",
"MasterUsername": "adminuser",
"ClusterParameterGroups": [
{
"ParameterApplyStatus": "in-sync",
"ParameterGroupName": "default.redshift-1.0"
} ],
"ClusterSecurityGroups": [
{
"Status": "active",
"ClusterSecurityGroupName": "default"
} ],
"AllowVersionUpgrade": true,
"VpcSecurityGroups": \[],
"AvailabilityZone": "us-east-1a",
"ClusterCreateTime": "2013-01-22T21:59:29.559Z",
"PreferredMaintenanceWindow": "sat:03:30-sat:04:00",
"AutomatedSnapshotRetentionPeriod": 1,
"ClusterStatus": "available",
"ClusterIdentifier": "mycluster",
"DBName": "dev",
"NumberOfNodes": 2,
"PendingModifiedValues": {}
} ],
"ResponseMetadata": {
"RequestId": "65b71cac-64df-11e2-8f5b-e90bd6c77476"
}
}
You will need to retrieve Memory and Storage statistics from Amazon CloudWatch.
See your other question: Amazon CloudWatch is not returning Redshift metrics
If you actually want to retrieve information about a standard cluster (that is, the amount of storage and memory assigned to each node, rather than current memory and storage usage), that is not available from an API call. Instead see: Amazon Redshift Clusters

best way in Kubernetes to receive notifications of errors when instantiating resources from .json or .yaml specifications?

I am using fabric8 to develop a cluster management layer on top of Kubernetes, and I am confused as to what the
'official' API is to obtain notifications of errors when things go wrong when
instantiating pods/rep controllers & services etc.
In the section "Pod Deployment Code" I have a stripped down version of what we do for pods. In the event
that everything goes correctly, our code is fine. We rely on setting 'watches' as you
can see in the method deployPodWithWatch. All I do in the given eventReceived callback
is to print the event, but our real code will break apart a notification like this:
got action:
MODIFIED /
Pod(apiVersion=v1, kind=Pod, metadata=...etc etc
status=PodStatus(
conditions=[
and pick out the 'status' element of the Pod and when we get PodCondition(status=True, type=Ready), we know
that our pod has been successfully deployed.
In the happy path case this works great. And you can actually run the code supplied with variable
k8sUrl set to the proper url for your site (hopefully your k8s
installation does not require auth which is site specific so i didn't provide code for that).
However, suppose you change the variable imageName to "nginBoo". There is no public docker
image of that name, so after you run the code, set your kubernetes context to the namespace "junk",
and do a
describe pod podboy
you will see two status messages at the end with the following values for Reason / Message
Reason message
failedSync Error syncing pod, skipping...
failed Failed to pull image "nginBoo": API error (500):
Error parsing reference: "nginBoo"
is not a valid repository/tag
I would like to implement a watch callback so that it catches these types of errors. However,
the only thing that I see are 'MODIFIED' events wherein the Pod has a field like this:
state=ContainerState(running=null, terminated=null,
waiting=ContainerStateWaiting(
reason=API error (500):
Error parsing reference:
"nginBoo" is not a valid repository/tag
I suppose I could look for a reason code that contained the string 'API error' but this seems
to be very much an implementation-dependent hack -- it might not cover all cases, and maybe it will
change under my feet with future versions. I'd like some more 'official' way of
figuring out if there was an error, but my searches have come up dry -- so I humbly
request guidance from all of you k8s experts out there. Thanks !
Pod Deployment Code
import com.fasterxml.jackson.databind.ObjectMapper
import scala.collection.JavaConverters._
import com.ning.http.client.ws.WebSocket
import com.typesafe.scalalogging.StrictLogging
import io.fabric8.kubernetes.api.model.{DoneableNamespace, Namespace, Pod, ReplicationController}
import io.fabric8.kubernetes.client.DefaultKubernetesClient.ConfigBuilder
import io.fabric8.kubernetes.client.Watcher.Action
import io.fabric8.kubernetes.client.dsl.Resource
import io.fabric8.kubernetes.client.{DefaultKubernetesClient, Watcher}
object ErrorTest extends App with StrictLogging {
// corresponds to --insecure-skip-tls-verify=true, according to io.fabric8.kubernetes.api.model.Cluster
val trustCerts = true
val k8sUrl = "http://localhost:8080"
val namespaceName = "junk" // replace this with name of a namespace that you know exists
val imageName: String = "nginx"
def go(): Unit = {
val kube = getConnection
dumpNamespaces(kube)
deployPodWithWatch(kube, getPod(image = imageName))
}
def deployPodWithWatch(kube: DefaultKubernetesClient, pod: Pod): Unit = {
kube.pods().inNamespace(namespaceName).create(pod) /* create the pod ! */
val podWatchWebSocket: WebSocket = /* create watch on the pod */
kube.pods().inNamespace(namespaceName).withName(pod.getMetadata.getName).watch(getPodWatch)
}
def getPod(image: String): Pod = {
val jsonTemplate =
"""
|{
| "kind": "Pod",
| "apiVersion": "v1",
| "metadata": {
| "name": "podboy",
| "labels": {
| "app": "nginx"
| }
| },
| "spec": {
| "containers": [
| {
| "name": "podboy",
| "image": "<image>",
| "ports": [
| {
| "containerPort": 80,
| "protocol": "TCP"
| }
| ]
| }
| ]
| }
|}
""".
stripMargin
val replacement: String = "image\": \"" + image
val json = jsonTemplate.replaceAll("image\": \"<image>", replacement)
System.out.println("json:" + json);
new ObjectMapper().readValue(json, classOf[Pod])
}
def dumpNamespaces(kube: DefaultKubernetesClient): Unit = {
val namespaceNames = kube.namespaces().list().getItems.asScala.map {
(ns: Namespace) => {
ns.getMetadata.getName
}
}
System.out.println("namespaces are:" + namespaceNames);
}
def getConnection = {
val configBuilder = new ConfigBuilder()
val config =
configBuilder.
trustCerts(trustCerts).
masterUrl(k8sUrl).
build()
new DefaultKubernetesClient(config)
}
def getPodWatch: Watcher[Pod] = {
new Watcher[Pod]() {
def eventReceived(action: Action, watchedPod: Pod) {
System.out.println("got action: " + action + " / " + watchedPod)
}
}
}
go()
}
I'd suggest you to have a look at events, see this topic for some guidance. Generally each object should generate events you can watch and be notified of such errors.