This message was deleted.
# ask-for-help
s
This message was deleted.
b
Ah okie. Yeah I figured it out. I did something like this:
Copy code
apiVersion: <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>
kind: Role
metadata:
  name: yatai-role
  namespace: ds-models
rules:
- apiGroups:
  - ""
  - <http://serving.yatai.ai|serving.yatai.ai>
  - <http://networking.k8s.io|networking.k8s.io>
  resources:
  - pods
  - namespaces
  - bentodeployments
  - ingresses
  verbs:
  - get
  - watch
  - list
  - create
  - update
Copy code
kubectl create rolebinding yatai-role-binding --user=system:serviceaccount:yatai-system:yatai --role=yatai-role -n ds-models
Hmm... but now:
Copy code
2022-10-13T01:59:08.628374348Z Downloading bento ktp_ocr:uipktisj5sjyzz3e tar file from <http://yatai.yatai-system.svc.cluster.local/api/v1/bento_repositories/ktp_ocr/bentos/uipktisj5sjyzz3e/download> to /tmp/downloaded.tar...
2022-10-13T01:59:08.781018745Z curl: (22) The requested URL returned error: 500
If I change
yatai.yatai-system.svc.cluster.local
to my host then I can hit the download the bento
maybe i should reinstall from scratch again ...
x
just use
helm upgrade
b
hmmm
that's what I did
shd i run the two commands u pasted previously agian?
x
No, just upgrade yatai-deployment
b
hmm, what values should I set ?
oh u mean set
Copy code
bentoDeploymentNamespaces: ['yatai']
?
x
yes, this value
b
Oki doki. For reference:
Copy code
helm upgrade  yatai-deployment bentoml/yatai-deployment -n yatai-deployment --set bentoDeploymentNamespaces=<NEW NAMESPACE> --devel
Hmm:
Copy code
deploy deployment revision: failed to deploy kube bento deployment: failed to get kube bento deployment: conversion webhook for <http://serving.yatai.ai/v1alpha3|serving.yatai.ai/v1alpha3>, Kind=BentoDeployment failed: no kind "BentoDeployment" is registered for version "<http://serving.yatai.ai/v1alpha3|serving.yatai.ai/v1alpha3>" in scheme "pkg/runtime/scheme.go:100"
b
Hmmm same thing
must I restart anything?
x
can you show me the output?
Copy code
kubectl api-resources | grep bento
b
Copy code
bentodeployments                                                                                                           <http://serving.yatai.ai/v1alpha3|serving.yatai.ai/v1alpha3>                            true         BentoDeployment
x
yes, you should restart yatai-deployment
b
Copy code
k rollout restart deploy yatai-deployment -n yatai-deployment
like this right?
x
yes
b
yeah did that multiple times but it's still the same thing
x
can you show me this output?
Copy code
k -n yatai-deployment get pod
b
let me delete the yatai-deployment pod
yea it looks like it's still 14 minutes
Copy code
yatai-deployment-7fc55f647b-zmv4m       1/1     Running     0          14m
yatai-deployment-default-domain-wmfbp   0/1     Completed   0          11h
it's taking a long time to delete ...
x
k delete pod ... --force --grace-period=0
try this
b
lol the pod isn't recreated
even with restarting the deployment
x
Copy code
kubectl -n yatai-deployment describe deploy yatai-deployment
b
k i got the pod back up
Copy code
Normal  ScalingReplicaSet  39m   deployment-controller  Scaled up replica set yatai-deployment-5c79585dff to 1
  Normal  ScalingReplicaSet  38m   deployment-controller  Scaled down replica set yatai-deployment-9d7f5974f to 0
  Normal  ScalingReplicaSet  32m   deployment-controller  Scaled up replica set yatai-deployment-7c84c88f to 1
  Normal  ScalingReplicaSet  31m   deployment-controller  Scaled down replica set yatai-deployment-5c79585dff to 0
  Normal  ScalingReplicaSet  29m   deployment-controller  Scaled up replica set yatai-deployment-fbf9f44f7 to 1
  Normal  ScalingReplicaSet  28m   deployment-controller  Scaled down replica set yatai-deployment-7c84c88f to 0
  Normal  ScalingReplicaSet  25m   deployment-controller  Scaled up replica set yatai-deployment-6f688cc494 to 1
  Normal  ScalingReplicaSet  25m   deployment-controller  Scaled down replica set yatai-deployment-fbf9f44f7 to 0
  Normal  ScalingReplicaSet  20m   deployment-controller  Scaled up replica set yatai-deployment-7fc55f647b to 1
  Normal  ScalingReplicaSet  20m   deployment-controller  (combined from similar events): Scaled down replica set yatai-depl
Copy code
deploy deployment revision: failed to deploy kube bento deployment: failed to get kube bento deployment: conversion webhook for <http://serving.yatai.ai/v1alpha3|serving.yatai.ai/v1alpha3>, Kind=BentoDeployment failed: Post "<https://yatai-deployment-webhook-service.yatai-deployment.svc:443/convert?timeout=30s>": no endpoints available for service "yatai-deployment-webhook-service"
this looks different
oh wait
the pod didn't get created after all
Copy code
Normal  ScalingReplicaSet  46m   deployment-controller  Scaled up replica set yatai-deployment-5c79585dff to 1
  Normal  ScalingReplicaSet  46m   deployment-controller  Scaled down replica set yatai-deployment-9d7f5974f to 0
  Normal  ScalingReplicaSet  39m   deployment-controller  Scaled up replica set yatai-deployment-7c84c88f to 1
  Normal  ScalingReplicaSet  39m   deployment-controller  Scaled down replica set yatai-deployment-5c79585dff to 0
  Normal  ScalingReplicaSet  36m   deployment-controller  Scaled up replica set yatai-deployment-fbf9f44f7 to 1
  Normal  ScalingReplicaSet  36m   deployment-controller  Scaled down replica set yatai-deployment-7c84c88f to 0
  Normal  ScalingReplicaSet  32m   deployment-controller  Scaled up replica set yatai-deployment-6f688cc494 to 1
  Normal  ScalingReplicaSet  32m   deployment-controller  Scaled down replica set yatai-deployment-fbf9f44f7 to 0
  Normal  ScalingReplicaSet  27m   deployment-controller  Scaled up replica set yatai-deployment-7fc55f647b to 1
  Normal  ScalingReplicaSet  27m   deployment-controller  (combined from similar events): Scaled down replica set yatai-deployment-6f688cc494 to 0
x
Copy code
k -n yatai-deployment get rs -l <http://app.kubernetes.io/name=yatai-deployment|app.kubernetes.io/name=yatai-deployment>
b
Copy code
yatai-deployment-5b77945db6   0         0         0       11h
yatai-deployment-5c79585dff   0         0         0       53m
yatai-deployment-5ff7446458   0         0         0       146m
yatai-deployment-65f898cc68   0         0         0       11h
yatai-deployment-6f688cc494   0         0         0       39m
yatai-deployment-7c84c88f     0         0         0       46m
yatai-deployment-7fc55f647b   1         1         1       34m
yatai-deployment-9d7f5974f    0         0         0       126m
yatai-deployment-d474f7bb8    0         0         0       11h
yatai-deployment-fbf9f44f7    0         0         0       43m
x
can you delete all rs and the checking the rs creation?
Copy code
k -n yatai-deployment delete rs -l <http://app.kubernetes.io/name=yatai-deployment|app.kubernetes.io/name=yatai-deployment>
Copy code
k -n yatai-deployment get rs -l <http://app.kubernetes.io/name=yatai-deployment|app.kubernetes.io/name=yatai-deployment>
b
Copy code
k -n yatai-deployment get rs -l <http://app.kubernetes.io/name=yatai-deployment|app.kubernetes.io/name=yatai-deployment>
No resources found in yatai-deployment namespace.
x
can you describe the yatai-deployment deployment?
Copy code
k -n yatai-deployment describe deploy yatai-deployment
b
Copy code
Normal  ScalingReplicaSet  60m   deployment-controller  Scaled up replica set yatai-deployment-7c84c88f to 1
  Normal  ScalingReplicaSet  60m   deployment-controller  Scaled down replica set yatai-deployment-5c79585dff to 0
  Normal  ScalingReplicaSet  57m   deployment-controller  Scaled up replica set yatai-deployment-fbf9f44f7 to 1
  Normal  ScalingReplicaSet  57m   deployment-controller  Scaled down replica set yatai-deployment-7c84c88f to 0
  Normal  ScalingReplicaSet  53m   deployment-controller  Scaled up replica set yatai-deployment-6f688cc494 to 1
  Normal  ScalingReplicaSet  53m   deployment-controller  Scaled down replica set yatai-deployment-fbf9f44f7 to 0
  Normal  ScalingReplicaSet  48m   deployment-controller  Scaled up replica set yatai-deployment-7fc55f647b to 1
  Normal  ScalingReplicaSet  48m   deployment-controller  (combined from similar events): Scaled down replica set yatai-deployment-6f688cc494 to 0
x
Suspect that you have some webhooks in your cluster that are constantly changing deployment resources
Copy code
k -n yatai-deployment rollout status deploy yatai-deployment
Copy code
k get events --sort-by=.metadata.creationTimestamp
b
Hmm I might have a webhook
lemme disable it
i just have one looking at the
yatai-builders
namespace
should i delete that as well?
x
do not delete
yatai-builders
b
ohoh i know i meant I have a webhook that schedules pods from the
yatai-builders
namespace
Copy code
k -n yatai-deployment rollout status deploy yatai-deployment
W1013 12:38:59.380507   65788 gcp.go:119] WARNING: the gcp auth plugin is deprecated in v1.22+, unavailable in v1.26+; use gcloud instead.
To learn more, consult <https://cloud.google.com/blog/products/containers-kubernetes/kubectl-auth-changes-in-gke>
Waiting for deployment spec update to be observed...
Copy code
k get events --sort-by=.metadata.creationTimestamp -n yatai-deployment
W1013 12:41:59.391593   65954 gcp.go:119] WARNING: the gcp auth plugin is deprecated in v1.22+, unavailable in v1.26+; use gcloud instead.
To learn more, consult <https://cloud.google.com/blog/products/containers-kubernetes/kubectl-auth-changes-in-gke>
LAST SEEN   TYPE      REASON                   OBJECT                                    MESSAGE
58m         Warning   Unhealthy                pod/yatai-deployment-7fc55f647b-zmv4m     Liveness probe failed: Get "<http://10.65.128.154:8081/healthz>": dial tcp 10.65.128.154:8081: connect: connection refused
58m         Warning   Unhealthy                pod/yatai-deployment-7fc55f647b-zmv4m     Readiness probe failed: Get "<http://10.65.128.154:8081/readyz>": dial tcp 10.65.128.154:8081: connect: connection refused
56m         Normal    Killing                  pod/yatai-deployment-7fc55f647b-zmv4m     Stopping container manager
54m         Warning   FailedMount              pod/yatai-deployment-7fc55f647b-zmv4m     Unable to attach or mount volumes: unmounted volumes=[cert kube-api-access-gtg56], unattached volumes=[cert kube-api-access-gtg56]: timed out waiting for the condition
54m         Warning   FailedMount              pod/docker-private-registry-proxy-dg9fd   MountVolume.SetUp failed for volume "kube-api-access-92k5b" : failed to sync configmap cache: timed out waiting for the condition
52m         Normal    Killing                  pod/docker-private-registry-proxy-7pg8n   Stopping container tcp-proxy
22m         Warning   FailedMount              pod/docker-private-registry-proxy-9782x   MountVolume.SetUp failed for volume "kube-api-access-45xn6" : failed to sync configmap cache: timed out waiting for the condition
15m         Normal    Killing                  pod/docker-private-registry-proxy-t6kk7   Stopping container tcp-proxy
14m         Warning   NetworkNotReady          pod/docker-private-registry-proxy-t6kk7   network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
13m         Normal    SandboxChanged           pod/docker-private-registry-proxy-t6kk7   Pod sandbox changed, it will be killed and re-created.
14m         Warning   FailedCreatePodSandBox   pod/docker-private-registry-proxy-t6kk7   Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "89b5f7db5744a81af886b37b60c88b9aa8bc9bde0fab09b5a15d2858c6774697" network for pod "docker-private-registry-proxy-t6kk7": networkPlugin cni failed to set up pod "docker-private-registry-proxy-t6kk7_yatai-deployment" network: stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/
13m         Warning   FailedCreatePodSandBox   pod/docker-private-registry-proxy-t6kk7   Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "778bdc2d6b584580a3eac1bcef1d62fb15e20aa69c80084fafce471a37545a36" network for pod "docker-private-registry-proxy-t6kk7": networkPlugin cni failed to set up pod "docker-private-registry-proxy-t6kk7_yatai-deployment" network: stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/
13m         Normal    Pulling                  pod/docker-private-registry-proxy-t6kk7   Pulling image "<http://quay.io/bentoml/proxy-to-service:v2|quay.io/bentoml/proxy-to-service:v2>"
13m         Normal    Pulled                   pod/docker-private-registry-proxy-t6kk7   Successfully pulled image "<http://quay.io/bentoml/proxy-to-service:v2|quay.io/bentoml/proxy-to-service:v2>" in 7.769060403s
13m         Normal    Created                  pod/docker-private-registry-proxy-t6kk7   Created container tcp-proxy
13m         Normal    Started                  pod/docker-private-registry-proxy-t6kk7   Started container tcp-proxy