Slackbot
10/17/2022, 9:34 AMXipeng Guan
10/17/2022, 9:38 AMkubectl -n yatai get bentodeployment xxx -o yaml
Benjamin Tan
10/17/2022, 9:39 AMapiVersion: <http://serving.yatai.ai/v1alpha3|serving.yatai.ai/v1alpha3>
kind: BentoDeployment
metadata:
name: ktp-ocr
namespace: yatai
spec:
extra_pod_spec:
# nodeSelector:
# <http://cloud.google.com/gke-nodepool|cloud.google.com/gke-nodepool>: "high-cpu-preemptible"
# tolerations:
# - key: "high-cpu-preemptible"
# operator: "Exists"
# effect: "NoSchedule"
bento_tag: ktp_ocr:uipktisj5sjyzz3e
ingress:
enabled: true
envs:
- key: PORT
value: "3000"
resources:
limits:
cpu: 1000m
requests:
cpu: 500m
autoscaling:
min_replicas: 2
max_replicas: 5
runners:
- name: face_image_cropper
resources:
requests:
cpu: "500m"
memory: "500Mi"
gpu: ''
limits:
cpu: "1000m"
memory: "1024Mi"
autoscaling:
min_replicas: 2
max_replicas: 5
extra_pod_spec:
nodeSelector:
<http://cloud.google.com/gke-nodepool|cloud.google.com/gke-nodepool>: "high-cpu-preemptible"
tolerations:
- key: "high-cpu-preemptible"
operator: "Exists"
effect: "NoSchedule"
- name: card_detector
resources:
requests:
cpu: "4000m"
memory: "4096Mi"
limits:
cpu: "4000m"
memory: "6144Mi"
autoscaling:
min_replicas: 2
max_replicas: 3
extra_pod_spec:
# nodeSelector:
# <http://cloud.google.com/gke-nodepool|cloud.google.com/gke-nodepool>: "high-cpu-preemptible"
# tolerations:
# - key: "high-cpu-preemptible"
# operator: "Exists"
# effect: "NoSchedule"
- name: orientation_fixer
resources:
requests:
cpu: "500m"
memory: "1024Mi"
limits:
cpu: "1000m"
memory: "2048Mi"
autoscaling:
min_replicas: 2
max_replicas: 3
extra_pod_spec:
# nodeSelector:
# <http://cloud.google.com/gke-nodepool|cloud.google.com/gke-nodepool>: "high-cpu-preemptible"
# tolerations:
# - key: "high-cpu-preemptible"
# operator: "Exists"
# effect: "NoSchedule"
- name: ocr_runnable
resources:
requests:
cpu: "4000m"
memory: "4096Mi"
limits:
cpu: "8000m"
memory: "6144Mi"
autoscaling:
min_replicas: 2
max_replicas: 5
extra_pod_spec:
# nodeSelector:
# <http://cloud.google.com/gke-nodepool|cloud.google.com/gke-nodepool>: "high-cpu-preemptible"
# tolerations:
# - key: "high-cpu-preemptible"
# operator: "Exists"
# effect: "NoSchedule"
Xipeng Guan
10/17/2022, 9:40 AMBenjamin Tan
10/17/2022, 9:41 AMXipeng Guan
10/17/2022, 9:41 AMkubectl -n yatai get deploy ktp-ocr -o yaml
Benjamin Tan
10/17/2022, 9:42 AMstatus:
conditions:
- lastTransitionTime: "2022-10-17T09:42:04Z"
lastUpdateTime: "2022-10-17T09:42:04Z"
message: Deployment does not have minimum availability.
reason: MinimumReplicasUnavailable
status: "False"
type: Available
- lastTransitionTime: "2022-10-17T09:42:04Z"
lastUpdateTime: "2022-10-17T09:42:04Z"
message: ReplicaSet "ktp-ocr-796b8f76d7" is progressing.
reason: ReplicaSetUpdated
status: "True"
type: Progressing
observedGeneration: 1
replicas: 1
unavailableReplicas: 1
updatedReplicas: 1
Benjamin Tan
10/17/2022, 9:42 AMBenjamin Tan
10/17/2022, 9:42 AMktp-ocr-796b8f76d7-457hc 0/2 Pending 0 6s
ktp-ocr-796b8f76d7-b6xwg 0/2 Pending 0 22s
ktp-ocr-runner-0-5466455548-6t6g2 0/2 Pending 0 22s
ktp-ocr-runner-0-5466455548-nxfdv 0/2 Pending 0 7s
ktp-ocr-runner-1-595586d66c-ltd4k 0/2 Pending 0 6s
ktp-ocr-runner-1-595586d66c-p4clw 0/2 Pending 0 22s
ktp-ocr-runner-2-88fc99b6f-vfmtz 0/2 Pending 0 22s
ktp-ocr-runner-2-88fc99b6f-wrtvn 0/2 Pending 0 6s
ktp-ocr-runner-3-6999fd97f9-6pcqt 0/2 Pending 0 22s
ktp-ocr-runner-3-6999fd97f9-vlpck 0/2 Pending 0 6s
Benjamin Tan
10/17/2022, 9:42 AMTo learn more, consult <https://cloud.google.com/blog/products/containers-kubernetes/kubectl-auth-changes-in-gke>
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
ktp-ocr-796b8f76d7-457hc 0/2 Pending 0 30s <none> <none> <none> <none>
ktp-ocr-796b8f76d7-b6xwg 0/2 Pending 0 46s <none> <none> <none> <none>
ktp-ocr-runner-0-5466455548-6t6g2 0/2 Pending 0 46s <none> <none> <none> <none>
ktp-ocr-runner-0-5466455548-nxfdv 0/2 Pending 0 31s <none> <none> <none> <none>
ktp-ocr-runner-1-595586d66c-ltd4k 0/2 Pending 0 30s <none> <none> <none> <none>
ktp-ocr-runner-1-595586d66c-p4clw 0/2 Pending 0 46s <none> <none> <none> <none>
ktp-ocr-runner-2-88fc99b6f-vfmtz 0/2 Pending 0 46s <none> <none> <none> <none>
ktp-ocr-runner-2-88fc99b6f-wrtvn 0/2 Pending 0 30s <none> <none> <none> <none>
ktp-ocr-runner-3-6999fd97f9-6pcqt 0/2 Pending 0 46s <none> <none> <none> <none>
ktp-ocr-runner-3-6999fd97f9-vlpck 0/2 Pending 0 30s <none> <none> <none> <none>
Xipeng Guan
10/17/2022, 9:45 AMkubectl -n yatai describe pod ktp-ocr-796b8f76d7-b6xwg
Benjamin Tan
10/17/2022, 9:45 AMBenjamin Tan
10/17/2022, 9:45 AMWarning FailedScheduling 4s (x10 over 95s) default-scheduler 0/79 nodes are available: 1 node(s) had taint {monitoring: true}, that the pod didn't tolerate, 3 node(s) had taint {airflowexecutors: true}, that the pod didn't tolerate, 3 node(s) had taint {jupyter-notebooks: true}, that the pod didn't tolerate, 3 node(s) had taint {kafka-connect: true}, that the pod didn't tolerate, 3 node(s) had taint {kubeflow-pipelines: true}, that the pod didn't tolerate, 3 node(s) had taint {preemptible: true}, that the pod didn't tolerate, 56 node(s) didn't match Pod's node affinity/selector, 7 node(s) had taint {high-cpu-preemptible: true}, that the pod didn't tolerate.
Xipeng Guan
10/17/2022, 9:46 AMtolerations
in extra_pod_spec
Benjamin Tan
10/17/2022, 9:47 AM` extra_pod_spec:
# nodeSelector:
# <http://cloud.google.com/gke-nodepool|cloud.google.com/gke-nodepool>: "high-cpu-preemptible"
# tolerations:
# - key: "high-cpu-preemptible"
# operator: "Exists"
# effect: "NoSchedule"
Benjamin Tan
10/17/2022, 9:47 AMXipeng Guan
10/17/2022, 9:47 AMkubectl -n yatai get deploy ktp-ocr -o yaml
Benjamin Tan
10/17/2022, 9:48 AMBenjamin Tan
10/17/2022, 9:50 AMBenjamin Tan
10/17/2022, 9:50 AMapiVersion: <http://serving.yatai.ai/v1alpha3|serving.yatai.ai/v1alpha3>
kind: BentoDeployment
metadata:
name: ktp-ocr
namespace: yatai
spec:
extra_pod_spec:
nodeSelector:
<http://cloud.google.com/gke-nodepool|cloud.google.com/gke-nodepool>: "high-cpu-preemptible"
tolerations:
- key: "high-cpu-preemptible"
operator: "Exists"
effect: "NoSchedule"
bento_tag: ktp_ocr:uipktisj5sjyzz3e
ingress:
enabled: true
resources:
limits:
cpu: 1000m
requests:
cpu: 500m
autoscaling:
min_replicas: 2
max_replicas: 5
runners:
- name: face_image_cropper
resources:
requests:
cpu: "500m"
memory: "500Mi"
gpu: ''
limits:
cpu: "1000m"
memory: "1024Mi"
autoscaling:
min_replicas: 2
max_replicas: 5
extra_pod_spec:
nodeSelector:
<http://cloud.google.com/gke-nodepool|cloud.google.com/gke-nodepool>: "high-cpu-preemptible"
tolerations:
- key: "high-cpu-preemptible"
operator: "Exists"
effect: "NoSchedule"
- name: card_detector
resources:
requests:
cpu: "4000m"
memory: "4096Mi"
limits:
cpu: "4000m"
memory: "6144Mi"
autoscaling:
min_replicas: 2
max_replicas: 3
extra_pod_spec:
nodeSelector:
<http://cloud.google.com/gke-nodepool|cloud.google.com/gke-nodepool>: "high-cpu-preemptible"
tolerations:
- key: "high-cpu-preemptible"
operator: "Exists"
effect: "NoSchedule"
- name: orientation_fixer
resources:
requests:
cpu: "500m"
memory: "1024Mi"
limits:
cpu: "1000m"
memory: "2048Mi"
autoscaling:
min_replicas: 2
max_replicas: 3
extra_pod_spec:
nodeSelector:
<http://cloud.google.com/gke-nodepool|cloud.google.com/gke-nodepool>: "high-cpu-preemptible"
tolerations:
- key: "high-cpu-preemptible"
operator: "Exists"
effect: "NoSchedule"
- name: ocr_runnable
resources:
requests:
cpu: "4000m"
memory: "4096Mi"
limits:
cpu: "8000m"
memory: "6144Mi"
autoscaling:
min_replicas: 2
max_replicas: 5
extra_pod_spec:
nodeSelector:
<http://cloud.google.com/gke-nodepool|cloud.google.com/gke-nodepool>: "high-cpu-preemptible"
tolerations:
- key: "high-cpu-preemptible"
operator: "Exists"
effect: "NoSchedule"
Benjamin Tan
10/17/2022, 9:50 AMk get po -n yatai -o wide
W1017 17:50:17.794983 263147 gcp.go:119] WARNING: the gcp auth plugin is deprecated in v1.22+, unavailable in v1.26+; use gcloud instead.
To learn more, consult <https://cloud.google.com/blog/products/containers-kubernetes/kubectl-auth-changes-in-gke>
No resources found in yatai namespace.
Xipeng Guan
10/17/2022, 9:51 AMkubectl -n yatai-deployment logs -f deploy/yatai-deployment
Benjamin Tan
10/17/2022, 9:51 AM1.666000212201634e+09 INFO finished cleaning up abandoned runner services {"func": "doCleanUpAbandonedRunnerServices"}
1.666000212319764e+09 INFO getting yatai client {"func": "doBuildBentoImages"}
1.6660002123202236e+09 INFO getting docker registry {"func": "doBuildBentoImages"}
1.6660002123243196e+09 INFO listing bentos from yatai {"func": "doBuildBentoImages"}
1.66600021238845e+09 INFO found 1 bentos need to build image {"func": "doBuildBentoImages"}
1.666000212388535e+09 INFO checking image docker-registry.yatai-deployment.svc.cluster.local:5000/bentos:yatai.ktp_ocr.uipktisj5sjyzz3e exists {"func": "doBuildBentoImages", "bentoTag": "ktp_ocr:uipktisj5sjyzz3e"}
1.66600021241183e+09 INFO image docker-registry.yatai-deployment.svc.cluster.local:5000/bentos:yatai.ktp_ocr.uipktisj5sjyzz3e exists {"func": "doBuildBentoImages", "bentoTag": "ktp_ocr:uipktisj5sjyzz3e"}
1.6660002422009413e+09 INFO start cleaning up abandoned runner services {"func": "doCleanUpAbandonedRunnerServices"}
1.6660002422011545e+09 INFO finished cleaning up abandoned runner services {"func": "doCleanUpAbandonedRunnerServices"}
1.666000242319438e+09 INFO getting yatai client {"func": "doBuildBentoImages"}
1.6660002423198652e+09 INFO getting docker registry {"func": "doBuildBentoImages"}
1.6660002423249292e+09 INFO listing bentos from yatai {"func": "doBuildBentoImages"}
1.6660002425216725e+09 INFO found 1 bentos need to build image {"func": "doBuildBentoImages"}
1.6660002425217624e+09 INFO checking image docker-registry.yatai-deployment.svc.cluster.local:5000/bentos:yatai.ktp_ocr.uipktisj5sjyzz3e exists {"func": "doBuildBentoImages", "bentoTag": "ktp_ocr:uipktisj5sjyzz3e"}
1.6660002425406697e+09 INFO image docker-registry.yatai-deployment.svc.cluster.local:5000/bentos:yatai.ktp_ocr.uipktisj5sjyzz3e exists {"func": "doBuildBentoImages", "bentoTag": "ktp_ocr:uipktisj5sjyzz3e"}
1.6660002722010727e+09 INFO start cleaning up abandoned runner services {"func": "doCleanUpAbandonedRunnerServices"}
1.666000272201281e+09 INFO finished cleaning up abandoned runner services {"func": "doCleanUpAbandonedRunnerServices"}
1.6660002723201852e+09 INFO getting yatai client {"func": "doBuildBentoImages"}
1.6660002723205574e+09 INFO getting docker registry {"func": "doBuildBentoImages"}
1.6660002723253105e+09 INFO listing bentos from yatai {"func": "doBuildBentoImages"}
1.6660002724527626e+09 INFO found 1 bentos need to build image {"func": "doBuildBentoImages"}
1.6660002724528313e+09 INFO checking image docker-registry.yatai-deployment.svc.cluster.local:5000/bentos:yatai.ktp_ocr.uipktisj5sjyzz3e exists {"func": "doBuildBentoImages", "bentoTag": "ktp_ocr:uipktisj5sjyzz3e"}
1.666000272472387e+09 INFO image docker-registry.yatai-deployment.svc.cluster.local:5000/bentos:yatai.ktp_ocr.uipktisj5sjyzz3e exists {"func": "doBuildBentoImages", "bentoTag": "ktp_ocr:uipktisj5sjyzz3e"}
Benjamin Tan
10/17/2022, 9:52 AMXipeng Guan
10/17/2022, 9:53 AMkubectl -n yatai describe deploy ktp-ocr
Benjamin Tan
10/17/2022, 9:54 AMkubectl -n yatai describe deploy ktp-ocr
W1017 17:53:54.578482 266435 gcp.go:119] WARNING: the gcp auth plugin is deprecated in v1.22+, unavailable in v1.26+; use gcloud instead.
To learn more, consult <https://cloud.google.com/blog/products/containers-kubernetes/kubectl-auth-changes-in-gke>
Error from server (NotFound): deployments.apps "ktp-ocr" not found
Benjamin Tan
10/17/2022, 9:54 AMk get bentodeployment -n yatai
W1017 17:54:14.034529 266712 gcp.go:119] WARNING: the gcp auth plugin is deprecated in v1.22+, unavailable in v1.26+; use gcloud instead.
To learn more, consult <https://cloud.google.com/blog/products/containers-kubernetes/kubectl-auth-changes-in-gke>
NAME BENTO READY MINREPLICAS MAXREPLICAS AGE
ktp-ocr ktp_ocr:uipktisj5sjyzz3e 2 5 4m32s
Xipeng Guan
10/17/2022, 9:55 AMkubectl -n yatai-deployment rollout restart deploy/yatai-deployment
Benjamin Tan
10/17/2022, 9:57 AMBenjamin Tan
10/17/2022, 9:57 AM1.66600064381458e+09 DEBUG events Getting Deployment yatai/ktp-ocr {"type": "Normal", "object": {"kind":"BentoDeployment","namespace":"yatai","name":"ktp-ocr","uid":"d07c22f9-4e6f-431c-a3f8-eb8e542f09bf","apiVersion":"<http://serving.yatai.ai/v1alpha3|serving.yatai.ai/v1alpha3>","resourceVersion":"1505490115"}, "reason": "GetDeployment"}
1.666000643814588e+09 DEBUG events Creating a new Deployment yatai/ktp-ocr {"type": "Normal", "object": {"kind":"BentoDeployment","namespace":"yatai","name":"ktp-ocr","uid":"d07c22f9-4e6f-431c-a3f8-eb8e542f09bf","apiVersion":"<http://serving.yatai.ai/v1alpha3|serving.yatai.ai/v1alpha3>","resourceVersion":"1505490115"}, "reason": "CreateDeployment"}
1.6660006438265834e+09 INFO Deployment created. {"controller": "bentodeployment", "controllerGroup": "<http://serving.yatai.ai|serving.yatai.ai>", "controllerKind": "BentoDeployment", "BentoDeployment": {"name":"ktp-ocr","namespace":"yatai"}, "namespace": "yatai", "name": "ktp-ocr", "reconcileID": "706a6b47-d9bf-4621-bf61-6e873adf49cb", "namespace": "yatai", "name": "ktp-ocr"}
1.6660006438266666e+09 INFO HPA not found. Creating a new one. {"controller": "bentodeployment", "controllerGroup": "<http://serving.yatai.ai|serving.yatai.ai>", "controllerKind": "BentoDeployment", "BentoDeployment": {"name":"ktp-ocr","namespace":"yatai"}, "namespace": "yatai", "name": "ktp-ocr", "reconcileID": "706a6b47-d9bf-4621-bf61-6e873adf49cb", "namespace": "yatai", "name": "ktp-ocr"}
1.6660006438267765e+09 DEBUG events Created Deployment yatai/ktp-ocr {"type": "Normal", "object": {"kind":"BentoDeployment","namespace":"yatai","name":"ktp-ocr","uid":"d07c22f9-4e6f-431c-a3f8-eb8e542f09bf","apiVersion":"<http://serving.yatai.ai/v1alpha3|serving.yatai.ai/v1alpha3>","resourceVersion":"1505490115"}, "reason": "CreateDeployment"}
1.6660006438268425e+09 DEBUG events Getting HPA yatai/ktp-ocr {"type": "Normal", "object": {"kind":"BentoDeployment","namespace":"yatai","name":"ktp-ocr","uid":"d07c22f9-4e6f-431c-a3f8-eb8e542f09bf","apiVersion":"<http://serving.yatai.ai/v1alpha3|serving.yatai.ai/v1alpha3>","resourceVersion":"1505490115"}, "reason": "GetHPA"}
1.6660006438268564e+09 DEBUG events Creating a new HPA yatai/ktp-ocr {"type": "Normal", "object": {"kind":"BentoDeployment","namespace":"yatai","name":"ktp-ocr","uid":"d07c22f9-4e6f-431c-a3f8-eb8e542f09bf","apiVersion":"<http://serving.yatai.ai/v1alpha3|serving.yatai.ai/v1alpha3>","resourceVersion":"1505490115"}, "reason": "CreateHPA"}
1.6660006438396223e+09 INFO HPA created. {"controller": "bentodeployment", "controllerGroup": "<http://serving.yatai.ai|serving.yatai.ai>", "controllerKind": "BentoDeployment", "BentoDeployment": {"name":"ktp-ocr","namespace":"yatai"}, "namespace": "yatai", "name": "ktp-ocr", "reconcileID": "706a6b47-d9bf-4621-bf61-6e873adf49cb", "namespace": "yatai", "name": "ktp-ocr"}
1.6660006438397336e+09 INFO Service not found. Creating a new one. {"controller": "bentodeployment", "controllerGroup": "<http://serving.yatai.ai|serving.yatai.ai>", "controllerKind": "BentoDeployment", "BentoDeployment": {"name":"ktp-ocr","namespace":"yatai"}, "namespace": "yatai", "name": "ktp-ocr", "reconcileID": "706a6b47-d9bf-4621-bf61-6e873adf49cb", "namespace": "yatai", "name": "ktp-ocr"}
1.6660006438397908e+09 DEBUG events Created HPA yatai/ktp-ocr {"type": "Normal", "object": {"kind":"BentoDeployment","namespace":"yatai","name":"ktp-ocr","uid":"d07c22f9-4e6f-431c-a3f8-eb8e542f09bf","apiVersion":"<http://serving.yatai.ai/v1alpha3|serving.yatai.ai/v1alpha3>","resourceVersion":"1505490115"}, "reason": "CreateHPA"}
1.6660006438398662e+09 DEBUG events Getting Service yatai/ktp-ocr {"type": "Normal", "object": {"kind":"BentoDeployment","namespace":"yatai","name":"ktp-ocr","uid":"d07c22f9-4e6f-431c-a3f8-eb8e542f09bf","apiVersion":"<http://serving.yatai.ai/v1alpha3|serving.yatai.ai/v1alpha3>","resourceVersion":"1505490115"}, "reason": "GetService"}
1.6660006438399072e+09 DEBUG events Creating a new Service yatai/ktp-ocr {"type": "Normal", "object": {"kind":"BentoDeployment","namespace":"yatai","name":"ktp-ocr","uid":"d07c22f9-4e6f-431c-a3f8-eb8e542f09bf","apiVersion":"<http://serving.yatai.ai/v1alpha3|serving.yatai.ai/v1alpha3>","resourceVersion":"1505490115"}, "reason": "CreateService"}
1.6660006438817403e+09 INFO Service created. {"controller": "bentodeployment", "controllerGroup": "<http://serving.yatai.ai|serving.yatai.ai>", "controllerKind": "BentoDeployment", "BentoDeployment": {"name":"ktp-ocr","namespace":"yatai"}, "namespace": "yatai", "name": "ktp-ocr", "reconcileID": "706a6b47-d9bf-4621-bf61-6e873adf49cb", "namespace": "yatai", "name": "ktp-ocr"}
1.6660006438818698e+09 DEBUG events Created Service yatai/ktp-ocr {"type": "Normal", "object": {"kind":"BentoDeployment","namespace":"yatai","name":"ktp-ocr","uid":"d07c22f9-4e6f-431c-a3f8-eb8e542f09bf","apiVersion":"<http://serving.yatai.ai/v1alpha3|serving.yatai.ai/v1alpha3>","resourceVersion":"1505490115"}, "reason": "CreateService"}
1.6660006438818934e+09 DEBUG events Generating hostname for ingress {"type": "Normal", "object": {"kind":"BentoDeployment","namespace":"yatai","name":"ktp-ocr","uid":"d07c22f9-4e6f-431c-a3f8-eb8e542f09bf","apiVersion":"<http://serving.yatai.ai/v1alpha3|serving.yatai.ai/v1alpha3>","resourceVersion":"1505490115"}, "reason": "GenerateIngressHost"}
time="2022-10-17T09:57:23Z" level=info msg="Creating ingress default-domain- to get a ingress IP automatically"
time="2022-10-17T09:57:24Z" level=info msg="Waiting for ingress default-domain-tpvzb to be ready"
1.6660006454443502e+09 INFO start cleaning up abandoned runner services {"func": "doCleanUpAbandonedRunnerServices"}
1.6660006454445243e+09 INFO finished cleaning up abandoned runner services {"func": "doCleanUpAbandonedRunnerServices"}
1.66600064556558e+09 INFO getting yatai client {"func": "doBuildBentoImages"}
1.666000645565871e+09 INFO getting docker registry {"func": "doBuildBentoImages"}
1.6660006455699806e+09 INFO listing bentos from yatai {"func": "doBuildBentoImages"}
1.6660006456261096e+09 INFO found 1 bentos need to build image {"func": "doBuildBentoImages"}
1.666000645626186e+09 INFO checking image docker-registry.yatai-deployment.svc.cluster.local:5000/bentos:yatai.ktp_ocr.uipktisj5sjyzz3e exists {"func": "doBuildBentoImages", "bentoTag": "ktp_ocr:uipktisj5sjyzz3e"}
1.666000645655784e+09 INFO image docker-registry.yatai-deployment.svc.cluster.local:5000/bentos:yatai.ktp_ocr.uipktisj5sjyzz3e exists {"func": "doBuildBentoImages", "bentoTag": "ktp_ocr:uipktisj5sjyzz3e"}
Benjamin Tan
10/17/2022, 9:58 AMktp-ocr-6957f7677c-ld5w8 0/2 Pending 0 24s <none> <none> <none> <none>
ktp-ocr-6957f7677c-lgvhd 2/2 Running 0 42s 10.65.158.135 gke-sandbox-data-k8s-high-cpu-preempt-68919cc7-hzdz <none> <none>
ktp-ocr-runner-0-5867d8c7fb-9mfh5 2/2 Running 0 42s 10.65.158.133 gke-sandbox-data-k8s-high-cpu-preempt-68919cc7-hzdz <none> <none>
ktp-ocr-runner-0-5867d8c7fb-qjs9j 0/2 Pending 0 25s <none> <none> <none> <none>
ktp-ocr-runner-1-5d647b4cf7-67mxh 2/2 Running 0 42s 10.65.158.134 gke-sandbox-data-k8s-high-cpu-preempt-68919cc7-hzdz <none> <none>
ktp-ocr-runner-1-5d647b4cf7-rbsr4 0/2 Pending 0 25s <none> <none> <none> <none>
ktp-ocr-runner-2-5df969cff5-p45zq 0/2 Pending 0 25s <none> <none> <none> <none>
ktp-ocr-runner-2-5df969cff5-pz2gm 0/2 Pending 0 42s <none> <none> <none> <none>
ktp-ocr-runner-3-5b58f64b77-lmt7j 0/2 Pending 0 42s <none> <none> <none> <none>
ktp-ocr-runner-3-5b58f64b77-n2kfb 0/2 Pending 0 25s <none> <none>
Xipeng Guan
10/17/2022, 9:58 AMkubectl -n yatai describe pod ktp-ocr-6957f7677c-ld5w8
Benjamin Tan
10/17/2022, 9:59 AMWarning FailedScheduling 70s (x5 over 97s) default-scheduler 0/79 nodes are available: 1 Insufficient memory, 1 node(s) had taint {monitoring: true}, that the pod didn't tolerate, 3 node(s) had taint {airflowexecutors: true}, that the pod didn't tolerate, 3 node(s) had taint {jupyter-notebooks: true}, that the pod didn't tolerate, 3 node(s) had taint {kafka-connect: true}, that the pod didn't tolerate, 3 node(s) had taint {kubeflow-pipelines: true}, that the pod didn't tolerate, 3 node(s) had taint {preemptible: true}, that the pod didn't tolerate, 56 node(s) didn't match Pod's node affinity/selector, 6 Insufficient cpu.
Benjamin Tan
10/17/2022, 9:59 AMBenjamin Tan
10/17/2022, 10:00 AMXipeng Guan
10/17/2022, 10:01 AMBenjamin Tan
10/17/2022, 10:02 AMkubectl -n yatai-deployment rollout restart deploy/yatai-deployment
?Benjamin Tan
10/17/2022, 10:03 AMtolerations
work before and I had to restart it to get it to work?Xipeng Guan
10/17/2022, 10:04 AMBenjamin Tan
10/17/2022, 10:04 AMBenjamin Tan
10/17/2022, 10:04 AM1.6660010355662887e+09 INFO getting yatai client {"func": "doBuildBentoImages"}
1.666001035567261e+09 INFO getting docker registry {"func": "doBuildBentoImages"}
1.6660010355735934e+09 INFO listing bentos from yatai {"func": "doBuildBentoImages"}
1.6660010356500514e+09 INFO found 1 bentos need to build image {"func": "doBuildBentoImages"}
1.6660010356510475e+09 INFO checking image docker-registry.yatai-deployment.svc.cluster.local:5000/bentos:yatai.ktp_ocr.uipktisj5sjyzz3e exists {"func": "doBuildBentoImages", "bentoTag": "ktp_ocr:uipktisj5sjyzz3e"}
1.6660010356893494e+09 INFO image docker-registry.yatai-deployment.svc.cluster.local:5000/bentos:yatai.ktp_ocr.uipktisj5sjyzz3e exists {"func": "doBuildBentoImages", "bentoTag": "ktp_ocr:uipktisj5sjyzz3e"}
1.6660010654443023e+09 INFO start cleaning up abandoned runner services {"func": "doCleanUpAbandonedRunnerServices"}
1.6660010654447703e+09 INFO finished cleaning up abandoned runner services {"func": "doCleanUpAbandonedRunnerServices"}
1.6660010655656343e+09 INFO getting yatai client {"func": "doBuildBentoImages"}
1.666001065566664e+09 INFO getting docker registry {"func": "doBuildBentoImages"}
1.666001065571088e+09 INFO listing bentos from yatai {"func": "doBuildBentoImages"}
1.6660010656432586e+09 INFO found 1 bentos need to build image {"func": "doBuildBentoImages"}
1.6660010656436129e+09 INFO checking image docker-registry.yatai-deployment.svc.cluster.local:5000/bentos:yatai.ktp_ocr.uipktisj5sjyzz3e exists {"func": "doBuildBentoImages", "bentoTag": "ktp_ocr:uipktisj5sjyzz3e"}
1.6660010656611772e+09 INFO image docker-registry.yatai-deployment.svc.cluster.local:5000/bentos:yatai.ktp_ocr.uipktisj5sjyzz3e exists {"func": "doBuildBentoImages", "bentoTag": "ktp_ocr:uipktisj5sjyzz3e"}
Xipeng Guan
10/17/2022, 10:05 AMkubectl -n yatai-deployment get pod
Benjamin Tan
10/17/2022, 10:05 AMXipeng Guan
10/17/2022, 10:06 AMkubectl -n yatai describe bentodeployment ktp-ocr
Benjamin Tan
10/17/2022, 10:07 AMName: ktp-ocr
Namespace: yatai
Labels: <none>
Annotations: <none>
API Version: <http://serving.yatai.ai/v1alpha3|serving.yatai.ai/v1alpha3>
Kind: BentoDeployment
Metadata:
Creation Timestamp: 2022-10-17T10:03:43Z
Generation: 1
Managed Fields:
API Version: <http://serving.yatai.ai/v1alpha3|serving.yatai.ai/v1alpha3>
Fields Type: FieldsV1
fieldsV1:
f:spec:
.:
f:autoscaling:
.:
f:max_replicas:
f:min_replicas:
f:bento_tag:
f:extra_pod_spec:
.:
f:nodeSelector:
.:
f:<http://cloud.google.com/gke-nodepool|cloud.google.com/gke-nodepool>:
f:tolerations:
f:ingress:
.:
f:enabled:
f:resources:
.:
f:limits:
.:
f:cpu:
f:requests:
.:
f:cpu:
f:runners:
Manager: kubectl-create
Operation: Update
Time: 2022-10-17T10:03:43Z
Resource Version: 1505533367
UID: a21a7676-0084-491c-bb35-fb067092f42a
Spec:
Autoscaling:
max_replicas: 3
min_replicas: 2
bento_tag: ktp_ocr:uipktisj5sjyzz3e
extra_pod_spec:
Node Selector:
<http://cloud.google.com/gke-nodepool|cloud.google.com/gke-nodepool>: high-cpu-preemptible
Tolerations:
Effect: NoSchedule
Key: high-cpu-preemptible
Operator: Exists
Ingress:
Enabled: true
Resources:
Limits:
Cpu: 1000m
Requests:
Cpu: 500m
Runners:
Autoscaling:
max_replicas: 3
min_replicas: 2
extra_pod_spec:
Node Selector:
<http://cloud.google.com/gke-nodepool|cloud.google.com/gke-nodepool>: high-cpu-preemptible
Tolerations:
Effect: NoSchedule
Key: high-cpu-preemptible
Operator: Exists
Name: face_image_cropper
Resources:
Limits:
Cpu: 1000m
Memory: 1024Mi
Requests:
Cpu: 500m
Gpu:
Memory: 500Mi
Autoscaling:
max_replicas: 3
min_replicas: 2
extra_pod_spec:
Node Selector:
<http://cloud.google.com/gke-nodepool|cloud.google.com/gke-nodepool>: high-cpu-preemptible
Tolerations:
Effect: NoSchedule
Key: high-cpu-preemptible
Operator: Exists
Name: card_detector
Resources:
Limits:
Cpu: 4000m
Memory: 6144Mi
Requests:
Cpu: 4000m
Memory: 4096Mi
Autoscaling:
max_replicas: 3
min_replicas: 2
extra_pod_spec:
Node Selector:
<http://cloud.google.com/gke-nodepool|cloud.google.com/gke-nodepool>: high-cpu-preemptible
Tolerations:
Effect: NoSchedule
Key: high-cpu-preemptible
Operator: Exists
Name: orientation_fixer
Resources:
Limits:
Cpu: 1000m
Memory: 2048Mi
Requests:
Cpu: 500m
Memory: 1024Mi
Autoscaling:
max_replicas: 3
min_replicas: 2
extra_pod_spec:
Node Selector:
<http://cloud.google.com/gke-nodepool|cloud.google.com/gke-nodepool>: high-cpu-preemptible
Tolerations:
Effect: NoSchedule
Key: high-cpu-preemptible
Operator: Exists
Name: ocr_runnable
Resources:
Limits:
Cpu: 8000m
Memory: 6144Mi
Requests:
Cpu: 4000m
Memory: 4096Mi
Events: <none>
Xipeng Guan
10/17/2022, 10:12 AMBenjamin Tan
10/17/2022, 10:12 AMes"}
1.666001515508279e+09 INFO getting yatai client {"func": "doRegisterYataiComponent"}
1.6660015155662005e+09 INFO getting yatai client {"func": "doBuildBentoImages"}
1.6660015155666518e+09 INFO getting docker registry {"func": "doBuildBentoImages"}
1.6660015155703478e+09 INFO listing bentos from yatai {"func": "doBuildBentoImages"}
1.6660015156989498e+09 INFO found 1 bentos need to build image {"func": "doBuildBentoImages"}
1.666001515699035e+09 INFO checking image docker-registry.yatai-deployment.svc.cluster.local:5000/bentos:yatai.ktp_ocr.uipktisj5sjyzz3e exists {"func": "doBuildBentoImages", "bentoTag": "ktp_ocr:uipktisj5sjyzz3e"}
1.6660015157316504e+09 INFO image docker-registry.yatai-deployment.svc.cluster.local:5000/bentos:yatai.ktp_ocr.uipktisj5sjyzz3e exists {"func": "doBuildBentoImages", "bentoTag": "ktp_ocr:uipktisj5sjyzz3e"}
1.6660015454444134e+09 INFO start cleaning up abandoned runner services {"func": "doCleanUpAbandonedRunnerServices"}
1.6660015454446256e+09 INFO finished cleaning up abandoned runner services {"func": "doCleanUpAbandonedRunnerServices"}
1.666001545565696e+09 INFO getting yatai client {"func": "doBuildBentoImages"}
1.6660015455661626e+09 INFO getting docker registry {"func": "doBuildBentoImages"}
1.6660015455704405e+09 INFO listing bentos from yatai {"func": "doBuildBentoImages"}
1.6660015456427312e+09 INFO found 1 bentos need to build image {"func": "doBuildBentoImages"}
1.6660015456428373e+09 INFO checking image docker-registry.yatai-deployment.svc.cluster.local:5000/bentos:yatai.ktp_ocr.uipktisj5sjyzz3e exists {"func": "doBuildBentoImages", "bentoTag": "ktp_ocr:uipktisj5sjyzz3e"}
1.6660015456696532e+09 INFO image docker-registry.yatai-deployment.svc.cluster.local:5000/bentos:yatai.ktp_ocr.uipktisj5sjyzz3e exists {"func": "doBuildBentoImages", "bentoTag": "ktp_ocr:uipktisj5sjyzz3e"}
Benjamin Tan
10/17/2022, 10:12 AMBenjamin Tan
10/17/2022, 10:13 AM1.6660015754445655e+09 INFO start cleaning up abandoned runner services {"func": "doCleanUpAbandonedRunnerServices"}
1.666001575444745e+09 INFO finished cleaning up abandoned runner services {"func": "doCleanUpAbandonedRunnerServices"}
1.6660015755658126e+09 INFO getting yatai client {"func": "doBuildBentoImages"}
1.6660015755661314e+09 INFO getting docker registry {"func": "doBuildBentoImages"}
1.66600157557021e+09 INFO listing bentos from yatai {"func": "doBuildBentoImages"}
1.6660015756370196e+09 INFO found 1 bentos need to build image {"func": "doBuildBentoImages"}
1.6660015756370828e+09 INFO checking image docker-registry.yatai-deployment.svc.cluster.local:5000/bentos:yatai.ktp_ocr.uipktisj5sjyzz3e exists {"func": "doBuildBentoImages", "bentoTag": "ktp_ocr:uipktisj5sjyzz3e"}
1.6660015756603642e+09 INFO image docker-registry.yatai-deployment.svc.cluster.local:5000/bentos:yatai.ktp_ocr.uipktisj5sjyzz3e exists {"func": "doBuildBentoImages", "bentoTag": "ktp_ocr:uipktisj5sjyzz3e"}
Benjamin Tan
10/17/2022, 10:13 AMBenjamin Tan
10/17/2022, 10:26 AMEvents:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal GetBento 9m19s yatai-deployment Fetching Bento ktp_ocr:uipktisj5sjyzz3e
Normal GetBento 9m19s yatai-deployment Fetched Bento ktp_ocr:uipktisj5sjyzz3e
Normal MakeSureDockerRegcred 9m19s yatai-deployment Successfully made sure docker registry credentials
Normal GetMajorCluster 9m19s yatai-deployment Fetching major cluster
Normal GetMajorCluster 9m19s yatai-deployment Successfully fetched major cluster
Normal GetYataiVersion 9m19s yatai-deployment Fetching yatai version
Normal GetYataiVersion 9m19s yatai-deployment Successfully fetched yatai version
Normal CheckImageExists 9m19s (x2 over 9m19s) yatai-deployment Checking image docker-registry.yatai-deployment.svc.cluster.local:5000/bentos:yatai.ktp_ocr.uipktisj5sjyzz3e exists
Normal CheckImageExists 9m19s (x2 over 9m19s) yatai-deployment Image docker-registry.yatai-deployment.svc.cluster.local:5000/bentos:yatai.ktp_ocr.uipktisj5sjyzz3e exists
Normal GetDeployment 9m19s yatai-deployment Getting Deployment yatai/ktp-ocr-runner-0
Normal CreateDeployment 9m19s yatai-deployment Creating a new Deployment yatai/ktp-ocr-runner-0
Normal CreateDeployment 9m19s yatai-deployment Created Deployment yatai/ktp-ocr-runner-0
Normal GetHPA 9m19s yatai-deployment Getting HPA yatai/ktp-ocr-runner-0
Normal CreateHPA 9m19s yatai-deployment Creating a new HPA yatai/ktp-ocr-runner-0
Normal CreateHPA 9m19s yatai-deployment Created HPA yatai/ktp-ocr-runner-0
Normal GetService 9m19s yatai-deployment Getting Service yatai/ktp-ocr-runner-9907d2e7d03a92835473bfddf5cc9c93
Normal CreateService 9m19s yatai-deployment Creating a new Service yatai/ktp-ocr-runner-9907d2e7d03a92835473bfddf5cc9c93
Normal CreateService 9m19s yatai-deployment Created Service yatai/ktp-ocr-runner-9907d2e7d03a92835473bfddf5cc9c93
Normal GetDeployment 9m19s yatai-deployment Getting Deployment yatai/ktp-ocr-runner-1
Normal CreateDeployment 9m19s yatai-deployment Creating a new Deployment yatai/ktp-ocr-runner-1
Normal CreateDeployment 9m19s yatai-deployment Created Deployment yatai/ktp-ocr-runner-1
Normal GetHPA 9m19s yatai-deployment Getting HPA yatai/ktp-ocr-runner-1
Normal CreateHPA 9m19s yatai-deployment Creating a new HPA yatai/ktp-ocr-runner-1
Xipeng Guan
10/17/2022, 10:27 AMBenjamin Tan
10/17/2022, 10:29 AMBenjamin Tan
10/17/2022, 10:29 AMBenjamin Tan
10/17/2022, 10:29 AMXipeng Guan
10/17/2022, 10:30 AMkubectl -n yatai get bentodeployment -w
Then open a new terminal and execute the following commands:
kubectl -n yatai get bentodeployment ktp-ocr -o yaml > /tmp/a.yaml
kubectl delete -f /tmp/a.yaml
kubectl apply -f /tmp/a.yaml
Xipeng Guan
10/17/2022, 10:31 AMBenjamin Tan
10/17/2022, 10:40 AMBenjamin Tan
10/17/2022, 10:40 AM<http://bentodeployment.serving.yatai.ai|bentodeployment.serving.yatai.ai> "ktp-ocr" deleted
Benjamin Tan
10/17/2022, 10:40 AMTo learn more, consult <https://cloud.google.com/blog/products/containers-kubernetes/kubectl-auth-changes-in-gke>
NAME BENTO READY MINREPLICAS MAXREPLICAS AGE
ktp-ocr ktp_ocr:uipktisj5sjyzz3e 2 3 29m
ktp-ocr ktp_ocr:uipktisj5sjyzz3e 2 3 29m
Benjamin Tan
10/17/2022, 10:40 AMBenjamin Tan
10/17/2022, 10:40 AMkubectl -n yatai get bentodeployment
W1017 18:40:32.021589 309352 gcp.go:119] WARNING: the gcp auth plugin is deprecated in v1.22+, unavailable in v1.26+; use gcloud instead.
To learn more, consult <https://cloud.google.com/blog/products/containers-kubernetes/kubectl-auth-changes-in-gke>
No resources found in yatai namespace.
Benjamin Tan
10/17/2022, 10:41 AMBenjamin Tan
10/17/2022, 10:41 AMBenjamin Tan
10/17/2022, 10:42 AMBenjamin Tan
10/17/2022, 11:25 AMXipeng Guan
10/18/2022, 1:02 AMkubectl -n yatai-system logs -f deploy/yatai
Benjamin Tan
10/18/2022, 1:03 AMBenjamin Tan
10/18/2022, 1:05 AMk get deploy -n yatai
W1018 09:03:43.199301 1561456 gcp.go:119] WARNING: the gcp auth plugin is deprecated in v1.22+, unavailable in v1.26+; use gcloud instead.
To learn more, consult <https://cloud.google.com/blog/products/containers-kubernetes/kubectl-auth-changes-in-gke>
NAME READY UP-TO-DATE AVAILABLE AGE
ktp-ocr 2/2 2 2 11h
ktp-ocr-runner-0 1/1 1 1 11h
ktp-ocr-runner-1 1/1 1 1 11h
ktp-ocr-runner-2 1/1 1 1 11h
ktp-ocr-runner-3 1/1 1 1 11h
Benjamin Tan
10/18/2022, 1:05 AM[311.900ms] [rows:1] SELECT count(*) FROM "user" WHERE perm = 'admin' AND "user"."deleted_at" IS NULL
INFO[2610] listing unsynced deployments cron="sync env"
INFO[2610] updating unsynced deployments syncing_at cron="sync env"
INFO[2610] updated unsynced deployments syncing_at cron="sync env"
INFO[2610] syncing unsynced app deployment deployments... cron="sync env"
INFO[2610] synced unsynced app deployment deployments... cron="sync env"
INFO[2700] listing unsynced deployments cron="sync env"
INFO[2700] updating unsynced deployments syncing_at cron="sync env"
INFO[2700] updated unsynced deployments syncing_at cron="sync env"
INFO[2700] syncing unsynced app deployment deployments... cron="sync env"
INFO[2700] synced unsynced app deployment deployments... cron="sync env"
INFO[2790] listing unsynced deployments cron="sync env"
INFO[2790] updating unsynced deployments syncing_at cron="sync env"
INFO[2790] updated unsynced deployments syncing_at cron="sync env"
INFO[2790] syncing unsynced app deployment deployments... cron="sync env"
INFO[2790] synced unsynced app deployment deployments... cron="sync env"
INFO[2880] listing unsynced deployments cron="sync env"
INFO[2880] updating unsynced deployments syncing_at cron="sync env"
INFO[2880] updated unsynced deployments syncing_at cron="sync env"
INFO[2880] syncing unsynced app deployment deployments... cron="sync env"
INFO[2880] synced unsynced app deployment deployments... cron="sync env"
INFO[2970] listing unsynced deployments cron="sync env"
INFO[2970] updating unsynced deployments syncing_at cron="sync env"
INFO[2970] updated unsynced deployments syncing_at cron="sync env"
INFO[2970] syncing unsynced app deployment deployments... cron="sync env"
INFO[2970] synced unsynced app deployment deployments... cron="sync env"
Benjamin Tan
10/18/2022, 1:15 AMXipeng Guan
10/18/2022, 1:20 AMBenjamin Tan
10/18/2022, 1:21 AMBenjamin Tan
10/18/2022, 1:22 AMWarning GenerateIngressHost 46m yatai-deployment Failed to generate hostname for ingress: get domain suffix: failed to wait for ingress default-domain-w4b2p to be ready: timed out waiting for the condition
Warning ReconcileError 46m yatai-deployment Failed to reconcile BentoDeployment: get domain suffix: failed to wait for ingress default-domain-w4b2p to be ready: timed out waiting for the condition
Warning GenerateIngressHost 26m yatai-deployment Failed to generate hostname for ingress: get domain suffix: failed to wait for ingress default-domain-v28cp to be ready: timed out waiting for the condition
Warning ReconcileError 26m yatai-deployment Failed to reconcile BentoDeployment: get domain suffix: failed to wait for ingress default-domain-v28cp to be ready: timed out waiting for the condition
Normal GetBento 6m55s (x38 over 13h) yatai-deployment Fetching Bento ktp_ocr:uipktisj5sjyzz3e
Normal GetBento 6m55s (x38 over 13h) yatai-deployment Fetched Bento ktp_ocr:uipktisj5sjyzz3e
Warning GenerateIngressHost 6m55s yatai-deployment Failed to generate hostname for ingress: get domain suffix: failed to wait for ingress default-domain-j7529 to be ready: timed out waiting for the condition
Warning ReconcileError 6m55s yatai-deployment Failed to reconcile BentoDeployment: get domain suffix: failed to wait for ingress default-domain-j7529 to be ready: timed out waiting for the condition
Benjamin Tan
10/18/2022, 1:22 AMXipeng Guan
10/18/2022, 1:23 AMBenjamin Tan
10/18/2022, 1:23 AMBenjamin Tan
10/18/2022, 1:24 AMyatai-deployment default-domain-26bwp <none> acd6e8afub7q0u2q1rjlg.this-is-yatai-in-order-to-generate-the-default-domain-suffix.yeah 80 20h
yatai-deployment default-domain-28v5r <none> acd6co8m4ccsv0ub2bm80.this-is-yatai-in-order-to-generate-the-default-domain-suffix.yeah 80 21h
yatai-deployment default-domain-b2hvt <none> acd5oijjvtpudr60vhi7g.this-is-yatai-in-order-to-generate-the-default-domain-suffix.yeah 80 44h
yatai-deployment default-domain-g7vnb <none> acd59b64j9lktirpucrg0.this-is-yatai-in-order-to-generate-the-default-domain-suffix.yeah 80 2d14h
yatai-deployment default-domain-rn4p9 <none> acd6fcfrnti2cqa73vbj0.this-is-yatai-in-order-to-generate-the-default-domain-suffix.yeah 80 18h
yatai-deployment default-domain-rr5bc <none> acd6i6r01othlajo2joc0.this-is-yatai-in-order-to-generate-the-default-domain-suffix.yeah 80 15h
yatai-deployment default-domain-v4wb8 <none> acd6vs9qgt4rtoiosdflg.this-is-yatai-in-order-to-generate-the-default-domain-suffix.yeah 80 9m6s
yatai-deployment default-domain-zz5cb <none> acd6bkahcf4nlirqd131g.this-is-yatai-in-order-to-generate-the-default-domain-suffix.yeah 80
Benjamin Tan
10/18/2022, 1:26 AMhelm upgrade --install
?Xipeng Guan
10/18/2022, 1:27 AMBenjamin Tan
10/18/2022, 1:36 AMBenjamin Tan
10/18/2022, 1:36 AMBenjamin Tan
10/18/2022, 1:36 AMnetwork
configmap I had was empty.Benjamin Tan
10/18/2022, 1:37 AMkubectl -n yatai-deployment patch cm/network --type merge --patch '{"data":{"ingress-class":"${INGRESS_CLASS}"}}'
👆 didnt work. I had to do:
kubectl -n yatai-deployment patch cm/network --type merge --patch '{"data":{"ingress-class":"'${INGRESS_CLASS}'"}}'
Benjamin Tan
10/18/2022, 1:37 AMBenjamin Tan
10/18/2022, 1:37 AMXipeng Guan
10/18/2022, 2:12 AMBenjamin Tan
10/18/2022, 2:15 AMXipeng Guan
10/18/2022, 2:18 AM