Kubeflow pit records

Kubeflow UI Jupyter permission problem

User None is not authorized to list … for namespace: anonymous

The solution of official issue has been tried, but it does not take effect. One of the issues mentioned that the source code can be changed to dev mode to remove permission authentication

User None is not authorized to list … · KubeFlow /kubeflow

Jupyter source

kubeflow/kubeflow

# Modify jupyter Kustomize, add the red part of the parameters, restart the Jupyter
#. The cache manifests/manifests - 0.7 - branch/jupyter/jupyter - web - app/base/deployment yml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: deployment
spec:
  replicas: 1
  template:
    spec:
      containers:
      - env:
        - name: ROK_SECRET_NAME
          valueFrom:
            configMapKeyRef:
              name: parameters
              key: ROK_SECRET_NAME
        - name: UI
          valueFrom:
            configMapKeyRef:
              name: parameters
              key: UI
        - name: USERID_HEADER
          value: $(userid-header)
        - name: USERID_PREFIX
          value: $(userid-prefix)
        - name: FLASK_ENV
          value: development
        image: GCR. IO/kubeflow - images - public/jupyter - web - app: v0.5.0
        imagePullPolicy: $(policy)
        command: ["python3"."main.py"]
        args: ["--dev"]
        name: jupyter-web-app
        ports:
        - containerPort: 5000
        volumeMounts:
        - mountPath: /etc/config
          name: config-volume
      serviceAccountName: service-account
      volumes:
      - configMap:
          name: config
        name: config-volume

Copy the code
# to restart
kustomize build | kubectl delete -f -
kustomize build | kubectl apply -f -
Copy the code

Error message in Charge UI

Error: mysql_query failed: errno: 2006, error: MySQL server has gone away. Code: 13

Restart GRPC – Metadata pod according to official issue resolved

Error: mysql_query failed: errno: 2006, error: MySQL server has gone away. Code: 13 · Issue #4604 · kubeflow/kubeflow

Reasons why

Mysql_query failed: errno: 2006, error: MySQL server has gone away · Issue #198 · kubeflow/metadata

The notebook-server cannot be connected

Sorry, /notebook is not a valid page #5010

Check whether port-forward is available

Kubectl port-forward SVC/Kenwood-test-n anonymous 8080:80 –address 10.10.62.180

According to the official issue, it is the deployment parameter of note-controller that is hardcoded without use_IStio on

Sorry, /notebook is not a valid page · Issue #5010 · kubeflow/kubeflow

Modify note- Controller parameters

#. The cache/manifests/manifests - 0.7 - branch/jupyter/notebook - controller/base/deployment. Yaml
Change # USE_ISTIO value to true
apiVersion: apps/v1
kind: Deployment
metadata:
  name: deployment
spec:
  template:
    spec:
      containers:
      - name: manager
        image: gcr.io/kubeflow-images-public/notebook-controller:v20190614-v0-160-g386f2749-e3b0c4
        command:
          - /manager
        env:
          - name: USE_ISTIO
            value: "true"
          - name: POD_LABELS
            value: $(POD_LABELS)
        imagePullPolicy: IfNotPresent
        livenessProbe:
          httpGet:
            path: /metrics
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 30
      serviceAccountName: service-account
Copy the code

Restart the node – controller

kustomize build | kubectl delete -f -
kustomize build | kubectl apply -f -
Copy the code

Image problem

  1. For some mirror pull policies, Always needs to be changed to IfNotPresent

  2. Some mirrors reference SHA256 and need to be changed to tag

  3. GCR image pull problem, using github Action to do synchronization to dockerHub

    I can fork my project for a change

kenwoodjw/sync_gcr

SHA Digest used in Knative -install · Issue #1521 · SSE

conclusion

  • FUCK GFW, pulling mirrors is a waste of time
  • All by issue solution

Kfserving model deployment

Complete Kubeflow uses teaching-developing ML models, conducting decentralized training and deploying services

The KFServing underlying layer is implemented by Knative and Istio, so it is possible to deploy both versions of the model simultaneously for Canary Deployment for A/B test.

Kubeflow V0.7, KNative 0.8 and Istio 1.1.6 are installed by default as part of the Kubeflow installation.

Kubeflow 1.0 KNative 0.11.1 and Istio 1.1.6 are installed by Default

kubeflow/kfserving

Modify the Knative Image tag

GCR. IO/knative - releases/knative. Dev/serving/CMD/activator: v0.8.0
GCR. IO/knative - releases/knative. Dev/serving/CMD/autoscaler - hpa: v0.8.0
GCR. IO/knative - releases/knative. Dev/serving/CMD/autoscaler: v0.8.0
GCR. IO/knative - releases/knative. Dev/serving/CMD/controller: v0.8.0
GCR. IO/knative - releases/knative. Dev/serving/CMD/networking/istio: v0.8.0
GCR. IO/knative - releases/knative. Dev/serving/CMD/webhook: v0.8.0
GCR. IO/knative - releases/knative. Dev/serving/CMD/queue: v0.8.0
Synchronize the mirror of kfSERVING ConfigMap
McR.microsoft.com/onnxruntime/server:v0.5.0
GCR. IO/kfserving sklearnserver: 0.2.0
GCR. IO/kfserving xgbserver: 0.2.0
GCR. IO/kfserving pytorchserver: 0.2.2
NVCR. IO/nvidia/tensorrtserver: 19.05 - py3
GCR. IO/kfserving alibi - explainer: 0.2.2
GCR. IO/kfserving/storage -, initializer: 0.2.2
GCR. IO/kfserving/logger: 0.2.2
Copy the code