K8sGPT で Gemini が使えるようになっていたので OpenAI の結果と比較してみた

2024.08.28

こんにちは、oshima です。世の中は AI ブームがまだまだ続いていますね。

AI を使った Kubernetes のツールである K8sGPT は、さまざまな AI プロバイダと連携でき、Google の Gemini とも連携できるようになっていたので、デフォルトで使われている OpenAI と比較をしてみました。

結論を先に申し上げると、OpenAI と Gemini のレスポンス内容には違いが出ました。
両者に違いがなかったら本ブログはボツになるところだったのでよかったです。

環境
Kubernetes: v1.29.6
K8sGPT: v0.3.39

当ブログではGoogle Kubernetes Engine（以下、GKE）を使って検証しています。

K8sGPT とは

K8sGPT の公式ページはこちら

K8sGPT とはなんでしょうか？

K8sGPT is a tool for scanning your kubernetes clusters, diagnosing and triaging issues in simple english. It has SRE experience codified into its analyzers and helps to pull out the most relevant information to enrich it with AI.

https://k8sgpt.ai

Google 翻訳すると、

K8sGPT は、簡単な英語で Kubernetes クラスターをスキャンし、問題を診断し、優先順位を付けるためのツールです。 SRE の経験がアナライザーに体系化されており、最も関連性の高い情報を引き出して AI で強化するのに役立ちます。

ということで、AI プロバイダに情報を送信し、AI と連携することでクラスタの問題を解決しやすくするツールです。

また、K8sGPT は、現在 CNCF サンドボックスプロジェクトの一つです。
CNCF とは、クラウドネイティブを推進する最大級のコミュニティです。
Kubernetes も CNCF の卒業プロジェクトで、かの有名な Prometheus や Falco も CNCF の卒業プロジェクトです。

サクッと触ってみたい方向け！
Killercoda にシナリオがありました。
https://killercoda.com/matthisholleville/scenario/k8sgpt-cli

Killercoda は、ブラウザ上で kubectl コマンドが叩ける基本的にはフリー料金のアプリケーションです。
Killercoda がクラスタを用意してくれるので、資格勉強やコマンドの基本操作を確認するのにとても貢献してくれます。

k8sgpt コマンドをインストールする

K8sGPT を使う方法は大きく分けて CLI と Operator の２つがあります。
どちらも kubeconfig の情報からクラスタに接続しスキャンするという動きです。

CLI

k8sgpt コマンドをインストールします。

私は Mac ユーザーなので Homebrew を使わせてもらいます。
その他の方は公式ドキュメントをご参照ください。

brew tap k8sgpt-ai/k8sgpt
brew install k8sgpt

serve モードで起動する

サーバとして起動することもできるみたいです。
https://docs.k8sgpt.ai/reference/cli/

k8sgpt serve

Operator

Helm が用意されています。

本ブログでは Operator を使った K8sGPT の使用方法は取り上げないですが、使い方を見てみると、AI プロバイダの API キーや接続先のクラスタの kubeconfig を secret としてデプロイし、K8sGPT という CRD で AI バックエンドの選択や Secret の指定をしていますね。

基本コマンドの確認

AI バックエンドの操作

k8sgpt auth

クラスタ分析

k8sgpt analyze

クラスタ分析対象をフィルタする

k8sgpt analyze --explain --filter=Log --namespace=default

AI バックエンドを使用してクラスタ分析を行う

k8sgpt analyze -b <backend> --explain

AI プロバイダに送信する際に情報をマスクする

k8sgpt analyze -b <backend> --explain --anonymize

※※※※※※情報の秘匿化についての注意点※※※※※※

※ v0.3.39 の情報です。

秘匿したい情報が --anonymize オプションで対象になるかどうか確認してください。

--anonymize オプションが有効なアナライザでは、AI バックエンドに送信する前に機密情報をマスクしますが、イベントメッセージは現在マスクされません。イベントメッセージには Pod 名やイメージ名が含まれる可能性があります。
また、--anonymize オプションに対応していないアナライザがあります。

詳しくは公式の README をご覧ください。

AI バックエンドの認証

K8sGPT では AI バックエンドの選択ができ、デフォルトでは OpenAI が使用されます。
k8sgpt auth list コマンドで選択できる AI プロバイダを確認できます。

$ k8sgpt auth list
Default:
> openai
Active:
Unused:
> openai
> localai
> ollama
> azureopenai
> cohere
> amazonbedrock
> amazonsagemaker
> google
> noopai
> huggingface
> googlevertexai
> oci
> watsonxai

OpenAI を使う場合

k8sgpt generate コマンドを使用すると OpenAI の API キーを発行する画面に遷移します。

OpenAI の API キーを発行するには OpenAI へのサインアップと支払い情報を設定する必要があります。
最低 $5 から利用できます。

$ k8sgpt generate


Opening: https://beta.openai.com/account/api-keys to generate a key for openai

Please copy the generated key and run `k8sgpt auth add` to add it to your config file

OpenAI の API キーを発行したら k8sgpt auth add コマンドを実行します。
対話式で API キーの入力を求められます。

$ k8sgpt auth add
Warning: backend input is empty, will use the default value: openai
Warning: model input is empty, will use the default value: gpt-3.5-turbo
Enter openai Key: <ここにAPIキーを入力>

入力すると以下のメッセージが表示され、K8sGPT の AI バックエンドに OpenAI が使えるようになります。

openai added to the AI backend provider list

Gemini を使う場合

K8sGPT v0.3.28 から AI バックエンドに Google が提供している Gemini が使えるようになりました。

利用する際は Google AI Studio で API キーを取得してください。

k8sgpt auth add --backend google --model gemini-pro --password "<Your API KEY>"

入力すると OpenAI の時と同様に以下のメッセージが出力され、K8sGPT の AI バックエンドに Gemini が使えるようになります。

google added to the AI backend provider list

クラスタ分析

準備

私は標準モードの GKE クラスタを用意しました。
お好きなクラスタを用意してください。

クラスタ名の指定と、ロケーションの指定、デフォルトでは3つのワーカーノードが作成されるので、--num-nodes で起動させたいノード数を指定しています。
また、節約のために、モニタリングとロギングをオフにし Spot VM を使っています。

PROJECT=<google cloud project id>

gcloud container clusters create k8sgpt-test \
    --release-channel stable \
    --zone asia-northeast1-b \
    --node-locations asia-northeast1-b \
    --num-nodes 1 \
    --spot \
    --monitoring=NONE \
    --logging=NONE \
    --no-enable-managed-prometheus \ 
    --project $PROJECT

クラスタが作成されたら kubeconfig のコンテキストを生成します。

gcloud container clusters get-credentials k8sgpt-test \
    --zone asia-northeast1-b \
    --project $PROJECT

Deployment と Service と HPA をデプロイします。
わざと正常に動作しないように設定した、間違ったマニフェストをデプロイし、K8sGPT によるクラスタ分析をしてみます。

エラーになる内容

Pod のリソースリクエスト値によるスケージュール不可
initContainer の起動失敗
livenessProbe による再起動
Service のラベルセレクターに一致する Pod がない
HPA のターゲットに一致する Deployment がない

Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  creationTimestamp: null
  labels:
    app: nginx
  name: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  strategy: {}
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: nginx
    spec:
      initContainers:         # 2. initContainerの起動失敗
      - image: busybox
        name: initialize
        command: ['sh', '-c', 'touch /root/noaccess'] 
        securityContext:
          runAsUser: 1000
          runAsGroup: 1000
      containers:
      - image: nginx
        name: nginx
        volumeMounts:
        - mountPath: /root
          subPath: noaccess
          name: noaccess
        resources:            # 1. Podのリソースリクエスト値によるスケージュール不可
          requests: 
            cpu: 500m
        livenessProbe:        # 3. livenessProbeによる再起動
          httpGet:
            path: wrong.html
            port: 80
          periodSeconds: 5
      volumes:
      - emptyDir: {}
        name: noaccess
status: {}

Service

apiVersion: v1
kind: Service
metadata:
  creationTimestamp: null
  labels:
    app: nginx
  name: nginx
spec:
  ports:
  - port: 80
    protocol: TCP
    targetPort: 80
  selector:                   # 4. Serviceのラベルセレクターに一致するPodがない
    app: wrong  
  type: NodePort
status:
  loadBalancer: {}

HPA

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  creationTimestamp: null
  name: nginx
spec:
  maxReplicas: 5
  minReplicas: 1
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: wrong               # 5. HPAのターゲットに一致するDeploymentがない
  targetCPUUtilizationPercentage: 50
status:
  currentReplicas: 0
  desiredReplicas: 0

上記のマニフェストをデプロイしました。

期待通り、正常に動いていません。
Deployment は Ready の Pod がなく、HPA はターゲットが unknown になっています。

$ kubectl get deploy,svc,hpa
NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/nginx   0/1     1            0           24s

NAME                 TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)        AGE
service/kubernetes   ClusterIP   34.118.224.1     <none>        443/TCP        157m
service/nginx        NodePort    34.118.231.224   <none>        80:30284/TCP   23s

NAME                                        REFERENCE          TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
horizontalpodautoscaler.autoscaling/nginx   Deployment/wrong   <unknown>/50%   1         5         0          24s

Service のエンドポイントもありません。


$ kubectl describe svc nginx | grep ^Endpoints
Endpoints:                <none>

実行

以下のコマンドを適宜使用しました。

Active になっている全てのアナライザに対して AI バックエンドに問い合わせる

k8sgpt analyze -b openai --explain --anonymize
k8sgpt analyze -b gemini --explain --anonymize

使用するアナライザの追加・削除

k8sgpt filters add Log
k8sgpt filters remove Log

Namespace とアナライザを限定して実行する

k8sgpt analyze -b openai --explain --filter=Log --namespace=default --anonymize
k8sgpt analyze -b google --explain --filter=Log --namespace=default --anonymize

クラスタ分析結果を比較する

1回目のスキャン

OpenAI の回答

AI Provider: openai

0: Service default/nginx()
- Error: Service has no endpoints, expected label app=wrong
Error: Service has no endpoints, expected label app=wrong
Solution:
1. Check the labels of the pods in the namespace.
2. Ensure that the pods have the label app=wrong.
3. If not, add the label to the pods using kubectl label command.
1: HorizontalPodAutoscaler default/nginx()
- Error: HorizontalPodAutoscaler uses Deployment/wrong as ScaleTargetRef which does not exist.
Error: HorizontalPodAutoscaler is referencing a Deployment that does not exist.

Solution:
1. Check the name of the Deployment referenced by the HorizontalPodAutoscaler.
2. Ensure the Deployment with that name exists in the cluster.
3. Update the HorizontalPodAutoscaler to reference the correct Deployment.
2: Pod default/nginx-7574f45d75-9plmb(Deployment/nginx)
- Error: 0/1 nodes are available: 1 Insufficient cpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.
Error: Not enough CPU resources available for the pod to be scheduled on any nodes.
Solution:
1. Check the CPU requests and limits set in the pod's YAML file.
2. Increase the CPU requests or limits if necessary.
3. Ensure that the nodes have enough CPU resources available.
4. Consider scaling up the cluster or optimizing resource usage.

Google 翻訳

AIプロバイダー：openai 

0: サービスのデフォルト/nginx() 
- エラー: サービスにはエンドポイントがありません。予期されたラベル app=間違っています 
エラー: サービスにはエンドポイントがありません。予期されたラベル app=間違っています解決： 
1. 名前空間内のポッドのラベルを確認します。 
2. ポッドに app=wrong というラベルが付いていることを確認します。
3. そうでない場合は、kubectl label コマンドを使用してラベルをポッドに追加します。1: horizontalPodAutoscaler のデフォルト/nginx()
- エラー: horizontalPodAutoscaler は、存在しない ScaleTargetRef として Deployment/間違ったものを使用します。
エラー: horizontalPodAutoscaler は存在しないデプロイメントを参照しています。

解決：
1. horizontalPodAutoscaler によって参照される Deployment の名前を確認します。
2. その名前のデプロイメントがクラスター内に存在することを確認します。
3. horizontalPodAutoscaler を更新して、正しいデプロイメントを参照します。
2: ポッドのデフォルト/nginx-7574f45d75-9plmb(デプロイメント/nginx)
- エラー: 0/1 ノードが利用可能です: 1 CPU が不十分です。プリエンプション: 0/1 ノードが利用可能: 1 受信ポッドに対してプリエンプションの犠牲者が見つかりません。
エラー: どのノードでもポッドをスケジュールするには、十分な CPU リソースがありません。
解決：
1. ポッドの YAML ファイルに設定されている CPU リクエストと制限を確認します。
2. 必要に応じて、CPU リクエストまたは制限を増やします。
3. ノードに十分な CPU リソースが利用可能であることを確認します。
4. クラスターをスケールアップするか、リソース使用量を最適化することを検討します。

Gemini の回答

AI Provider: google

0: HorizontalPodAutoscaler default/nginx()
- Error: HorizontalPodAutoscaler uses Deployment/wrong as ScaleTargetRef which does not exist.
Error: The HorizontalPodAutoscaler is using a Deployment that does not exist as its ScaleTargetRef.
Solution:
1. Check if the Deployment named "wrong" exists.
2. If it does not exist, create it.
3. Update the HorizontalPodAutoscaler to use the correct Deployment name as its ScaleTargetRef.

1: Service default/nginx()
- Error: Service has no endpoints, expected label app=wrong
Error: The service does not have any endpoints because the label "app=wrong" is not found.
Solution:
1. Check if the pods have the correct label "app=correct".
2. If not, update the pod labels to "app=correct".
3. Restart the pods to apply the new labels.

2: Pod default/nginx-5dbfc8c779-427jv(Deployment/nginx)
- Error: 0/1 nodes are available: 1 Insufficient cpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.
Error: Insufficient CPU resources and no preemption victims found for incoming pod.
Solution:
1. Check if there are any pods that can be preempted.
2. If there are no preemptible pods, increase the node's CPU capacity.
3. If increasing the node's CPU capacity is not possible, consider using a different node or scaling down the incoming pod.

Google 翻訳

AIプロバイダー: Google

0: horizontalPodAutoscaler のデフォルト/nginx()
- エラー: horizontalPodAutoscaler は、存在しない ScaleTargetRef として Deployment/間違ったものを使用します。
エラー: horizontalPodAutoscaler は、ScaleTargetRef として存在しない Deployment を使用しています。
解決：
1. 「wrong」という名前のデプロイメントが存在するかどうかを確認します。
2. 存在しない場合は作成します。
3. 正しい Deployment 名を ScaleTargetRef として使用するように、horizontalPodAutoscaler を更新します。

1: サービスのデフォルト/nginx()
- エラー: サービスにはエンドポイントがありません。予期されたラベル app=間違っています
エラー: ラベル「app=wrong」が見つからないため、サービスにはエンドポイントがありません。
解決：
1. ポッドに正しいラベル「app=correct」が付いているかどうかを確認します。
2. そうでない場合は、ポッドのラベルを「app=correct」に更新します。
3. ポッドを再起動して、新しいラベルを適用します。

2: ポッドのデフォルト/nginx-5dbfc8c779-427jv(デプロイメント/nginx)
- エラー: 0/1 ノードが利用可能です: 1 CPU が不十分です。プリエンプション: 0/1 ノードが利用可能: 1 受信ポッドに対してプリエンプションの犠牲者が見つかりません。
エラー: CPU リソースが不十分であり、受信ポッドに対してプリエンプションの犠牲者が見つかりませんでした。
解決：
1. プリエンプトできるポッドがあるかどうかを確認します。
2. プリエンプティブル ポッドがない場合は、ノードの CPU 容量を増やします。
3. ノードの CPU 容量を増やすことができない場合は、別のノードを使用するか、受信ポッドをスケールダウンすることを検討してください。

OpenAI の出力結果の行が詰まっていてちょっと見づらいのはなぜでしょうか。
実際はカラーで出力されるのでこれよりは多少見やすいです。

それぞれが提示した解決策については個人的には OpenAI の勝利です。

Service の解決方法では、Gemini は app=correct にするよう修正を求めています。
wrong という単語に引っ張られたのかもしれませんが、早計に決めつけているように感じられました。
マイナスポイントです。

また、HPA の解決方法では、OpenAI は Pod の Request/Limit にまで言及しています。
素晴らしいですね。

以下のように修正しました。

Pod のリソースリクエスト値を 50m に変更
HPA のターゲットを nginx に変更
Service のラベルセレクターを app=nginx に変更

---
         name: nginx
         resources:
           requests:
-            cpu: 500m
+            cpu: 50m
         terminationMessagePath: /dev/termination-log
         terminationMessagePolicy: File
         volumeMounts:

---
   scaleTargetRef:
     apiVersion: apps/v1
     kind: Deployment
-    name: wrong
+    name: nginx
   targetCPUUtilizationPercentage: 50
 status:
   currentReplicas: 0

---
     protocol: TCP
     targetPort: 80
   selector:
-    app: wrong
+    app: nginx
   sessionAffinity: None
   type: NodePort
 status:

2回目のスキャン

OpenAI の回答

AI Provider: openai

0: Service default/nginx()
- Error: Service has not ready endpoints, pods: [Pod/nginx-7c55488c6f-f8tln], expected 1
Error: Service has no ready endpoints, pods: [Pod/nginx-7c55488c6f-f8tln], expected 1.
Solution:
1. Check if the pod is running and healthy.
2. Ensure the pod is correctly labeled to match the service selector.
3. Restart the pod if necessary.
4. Verify the service configuration and endpoints.
1: HorizontalPodAutoscaler default/nginx()
- Error: Deployment /nginx does not have resource configured.
Error: Deployment /nginx does not have resource configured.
Solution:
1. Check the deployment YAML file for any missing resource configurations.
2. Add the necessary resource configurations such as CPU and memory limits.
3. Apply the updated YAML file using kubectl apply -f <filename>.yaml.
2: Pod default/nginx-7c55488c6f-f8tln(Deployment/nginx)
- Error: the last termination reason is Error container=initialize pod=nginx-7c55488c6f-f8tln
Error: The container named "initialize" in the pod "nginx-7c55488c6f-f8tln" encountered an error during termination.
Solution: 1. Check the logs of the container for more details. 2. Restart the pod by deleting and recreating it. 3. Monitor for any recurring issues.

Google 翻訳

AIプロバイダー：openai

0: サービスのデフォルト/nginx()
- エラー: サービスにはエンドポイントの準備ができていません。ポッド: [Pod/nginx-7c55488c6f-f8tln]、予期される値 1
エラー: サービスには準備ができたエンドポイントがありません。ポッド: [Pod/nginx-7c55488c6f-f8tln]、期待値 1。
解決：
1. ポッドが実行中で正常であるかどうかを確認します。
2. ポッドにサービス セレクターに一致する正しいラベルが付けられていることを確認します。
3. 必要に応じてポッドを再起動します。
4. サービス構成とエンドポイントを確認します。
1: horizontalPodAutoscaler のデフォルト/nginx()
- エラー: デプロイメント /nginx にはリソースが構成されていません。
エラー: デプロイメント /nginx にはリソースが構成されていません。
解決：
1. デプロイメント YAML ファイルに欠落しているリソース構成がないか確認します。
2. CPU やメモリの制限など、必要なリソース構成を追加します。
3. kubectl apply -f <filename>.yaml を使用して、更新された YAML ファイルを適用します。
2: ポッドのデフォルト/nginx-7c55488c6f-f8tln(デプロイメント/nginx)
- エラー: 最後の終了理由はエラーcontainer=initialize pod=nginx-7c55488c6f-f8tlnです
エラー: ポッド「nginx-7c55488c6f-f8tln」内の「initialize」という名前のコンテナーの終了中にエラーが発生しました。
解決策: 1. 詳細については、コンテナーのログを確認してください。 2. ポッドを削除して再作成して、ポッドを再起動します。 3. 再発する問題がないか監視します。

Gemini の回答

AI Provider: google

0: Service default/nginx()
- Error: Service has not ready endpoints, pods: [Pod/nginx-7c55488c6f-n6mcv], expected 1
Error: The nginx service has no ready endpoints, meaning that the pods are not yet ready to serve traffic.
Solution:
1. Check the status of the pods using `kubectl get pods`.
2. If the pods are not running, try restarting them using `kubectl rollout restart deployment nginx`.
3. If the pods are still not ready, check the logs using `kubectl logs pod/nginx-7c55488c6f-n6mcv` to identify any errors.

1: HorizontalPodAutoscaler default/nginx()
- Error: Deployment /nginx does not have resource configured.
Error: Deployment is missing resource configuration.
Solution: Add resource configuration to the Deployment manifest. Example:
```
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-deployment
spec:
  template:
    spec:
      containers:
      - name: my-container
        image: my-image
        resources:
          limits:
            memory: "128Mi"
            cpu: "500m"
```

2: Pod default/nginx-7c55488c6f-n6mcv(Deployment/nginx)
- Error: the last termination reason is Error container=initialize pod=nginx-7c55488c6f-n6mcv
Error: The last termination reason was an error in the initialize container of the nginx pod.
Solution:
1. Check the logs of the initialize container to identify the specific error.
2. Resolve the error in the initialize container.
3. Restart the nginx pod.

Google 翻訳

AIプロバイダー: Google

0: サービスのデフォルト/nginx()
- エラー: サービスにはエンドポイントの準備ができていません。ポッド: [Pod/nginx-7c55488c6f-n6mcv]、予期される値 1
エラー: nginx サービスには準備完了のエンドポイントがありません。これは、ポッドがまだトラフィックを処理する準備ができていないことを意味します。
解決：
1. 「kubectl get pods」を使用してポッドのステータスを確認します。
2. ポッドが実行されていない場合は、「kubectl rollout restartdeployment nginx」を使用してポッドを再起動してみてください。
3. ポッドの準備がまだ整っていない場合は、「kubectl logs pod/nginx-7c55488c6f-n6mcv」を使用してログを確認し、エラーを特定します。

1: horizontalPodAutoscaler のデフォルト/nginx()
- エラー: デプロイメント /nginx にはリソースが構成されていません。
エラー: デプロイメントにリソース構成がありません。
解決策: リソース構成をデプロイメントマニフェストに追加します。例：
「」
APIバージョン: アプリ/v1
種類：展開
メタデータ:
  名前: 私の展開
仕様:
  テンプレート：
    仕様:
      コンテナ:
      - 名前: 私のコンテナ
        画像: 私のイメージ
        リソース：
          制限:
            メモリ：「128Mi」
            CPU:「500メートル」
「」

2: ポッドのデフォルト/nginx-7c55488c6f-n6mcv(デプロイメント/nginx)
- エラー: 最後の終了理由はエラーcontainer=initialize pod=nginx-7c55488c6f-n6mcvです
エラー: 最後の終了理由は、nginx ポッドの初期化コンテナーでのエラーでした。
解決：
1. 初期化コンテナのログを確認して、特定のエラーを特定します。
2. 初期化コンテナのエラーを解決します。
3. nginx ポッドを再起動します。

またもや OpenAI は表示が多少崩れていますね。なぜでしょうか。

Gemini はマニフェストを提示していますね。驚きました。

現在の状況は Pod が initContainer で失敗しています。
原因は、initContainer 中の処理で Permission Denied が発生しているからです。

流石にどちらも Log を確認しろというところまでしか教えてくれないですね。
Log を有効にして Log のみ再度スキャンしてみます。

$ k8sgpt filters add Log
Warning: by enabling logs, you will be sending potentially sensitive data to the AI backend.
Filter Log added

Warning にもありますが、分析対象に機密情報がないかよくよく確認してください。
追加した Log を分析対象から削除する場合は以下を実行します。

k8sgpt filters remove Log

OpenAI の回答（Log のみ）

AI Provider: openai

0: Pod default/nginx-7c55488c6f-f8tln/nginx(Deployment/nginx)
- Error: Error the server rejected our request for an unknown reason (get pods nginx-7c55488c6f-f8tln) from Pod nginx-7c55488c6f-f8tln
Error: The server rejected the request for an unknown reason.

Solution:
1. Check if the Kubernetes cluster is running properly.
2. Verify the pod ID mentioned in the error message.
3. Check the permissions and access controls for the user making the request.
4. Restart the Kubernetes API server.

Google 翻訳

AIプロバイダー：openai

0: ポッドのデフォルト/nginx-7c55488c6f-f8tln/nginx(デプロイメント/nginx)
- エラー: サーバーが不明な理由でリクエストを拒否しました (ポッド nginx-7c55488c6f-f8tln からポッド nginx-7c55488c6f-f8tln を取得)
エラー: サーバーは不明な理由でリクエストを拒否しました。

解決：
1. Kubernetes クラスターが適切に実行されているかどうかを確認します。
2. エラー メッセージに記載されているポッド ID を確認します。
3. リクエストを行っているユーザーの権限とアクセス制御を確認します。
4. Kubernetes API サーバーを再起動します。

Gemini の回答（Logのみ）

AI Provider: google

0: Pod default/nginx-7c55488c6f-n6mcv/nginx(Deployment/nginx)
- Error: Error the server rejected our request for an unknown reason (get pods nginx-7c55488c6f-n6mcv) from Pod nginx-7c55488c6f-n6mcv
Error: The Kubernetes API server rejected the request for an unknown reason.

Solution:
1. Check the Kubernetes API server logs for more details about the error.
2. Verify that the request is properly formatted and authorized.
3. Retry the request after a short delay.

Google 翻訳

AIプロバイダー: Google

0: ポッドのデフォルト/nginx-7c55488c6f-n6mcv/nginx(デプロイ/nginx)
- エラー: サーバーが不明な理由でリクエストを拒否しました (ポッド nginx-7c55488c6f-n6mcv をポッド nginx-7c55488c6f-n6mcv から取得)
エラー: Kubernetes API サーバーは不明な理由でリクエストを拒否しました。

解決：
1. エラーの詳細については、Kubernetes API サーバーのログを確認してください。
2. リクエストが適切にフォーマットされ、許可されていることを確認します。
3. 少し待ってからリクエストを再試行します。

どちらも Kubernetes API サーバに問題があるように言っています。難しいですね。
仕方がないのでこちら側で権限エラーをを修正します。

Deployment を以下のように修正しました。
コンテナを root で実行するのはベストプラクティスからは外れていますがご容赦ください。

---
         name: initialize
         resources: {}
         securityContext:
-          runAsGroup: 1000
-          runAsUser: 1000
+          runAsGroup: 0
+          runAsUser: 0
         terminationMessagePath: /dev/termination-log
         terminationMessagePolicy: File
       restartPolicy: Always

3回目のスキャン

OpenAI の回答

AI Provider: openai

0: Service default/nginx()
- Error: Service has not ready endpoints, pods: [Pod/nginx-67fd4f78fc-hqz7f], expected 1
Error: Service has no ready endpoints, pods: [Pod/nginx-67fd4f78fc-hqz7f], expected 1.
Solution:
1. Check if the pod is running and healthy.
2. Ensure the pod is correctly labeled and selected by the service.
3. Verify the pod's readiness probes are passing.
4. Restart the pod if necessary.
1: HorizontalPodAutoscaler default/nginx()
- Error: Deployment /nginx does not have resource configured.
Error: Deployment PlFASVQ does not have resource configured.
Solution:
1. Check the deployment YAML file for any missing or incorrect resource configurations.
2. Add the necessary resource configurations such as CPU and memory limits.
3. Apply the updated YAML file using kubectl apply -f <filename>.
2: Pod default/nginx-67fd4f78fc-hqz7f(Deployment/nginx)
- Error: the last termination reason is Completed container=nginx pod=nginx-67fd4f78fc-hqz7f
Error: The container nginx in pod nginx-67fd4f78fc-hqz7f has completed and terminated.
Solution:
1. Check the logs of the nginx container to ensure it ran successfully.
2. Verify the resources and configurations of the pod.
3. If needed, redeploy the pod with updated configurations.

Google 翻訳

AIプロバイダー：openai

0: サービスのデフォルト/nginx()
- エラー: サービスにはエンドポイントの準備ができていません。ポッド: [Pod/nginx-67fd4f78fc-hqz7f]、予期される 1
エラー: サービスには準備ができたエンドポイントがありません。ポッド: [Pod/nginx-67fd4f78fc-hqz7f]、1 が必要です。
解決：
1. ポッドが実行中で正常であるかどうかを確認します。
2. ポッドに正しくラベルが付けられ、サービスによって選択されていることを確認します。
3. ポッドの Readiness プローブが通過していることを確認します。
4. 必要に応じてポッドを再起動します。
1: horizontalPodAutoscaler のデフォルト/nginx()
- エラー: デプロイメント /nginx にはリソースが構成されていません。
エラー: デプロイメント PlFASVQ にはリソースが構成されていません。
解決：
1. デプロイメント YAML ファイルに欠落しているリソース構成や正しくないリソース構成がないかを確認します。
2. CPU やメモリの制限など、必要なリソース構成を追加します。
3. kubectl apply -f <filename> を使用して、更新された YAML ファイルを適用します。
2: ポッドのデフォルト/nginx-67fd4f78fc-hqz7f(デプロイメント/nginx)
- エラー: 最後の終了理由は Completed ですcontainer=nginx pod=nginx-67fd4f78fc-hqz7f
エラー: ポッド nginx-67fd4f78fc-hqz7f のコンテナー nginx が完了し、終了しました。
解決：
1. nginx コンテナのログをチェックして、正常に実行されたことを確認します。
2. ポッドのリソースと構成を確認します。
3. 必要に応じて、更新された構成でポッドを再デプロイします。

Gemini の回答

AI Provider: google

0: Service default/nginx()
- Error: Service has not ready endpoints, pods: [Pod/nginx-67fd4f78fc-r4l4q], expected 1
Error: The Service named 'nginx' does not have any ready endpoints. This means that there are no Pods that are ready to serve traffic.
Solution:
1. Check the status of the Pods that are part of the Service.
2. If any of the Pods are not ready, try to restart them.
3. If the Pods are still not ready, check the logs to see if there are any errors.
4. If there are errors, try to fix them and then restart the Pods.

1: HorizontalPodAutoscaler default/nginx()
- Error: Deployment /nginx does not have resource configured.
Error: The deployment "nginx" is missing a resource configuration.
Solution:
1. Edit the deployment manifest and add the required resource configuration.
2. Re-deploy the deployment.

2: Pod default/nginx-67fd4f78fc-r4l4q(Deployment/nginx)
- Error: the last termination reason is Completed container=nginx pod=nginx-67fd4f78fc-r4l4q
Error: The nginx container in the nginx-67fd4f78fc-r4l4q pod has completed and terminated.

Solution:
1. Check the logs of the nginx container to determine why it completed.
2. If the container completed successfully, no further action is required.
3. If the container completed due to an error, investigate the error and take appropriate action to resolve it.

Google 翻訳

AIプロバイダー: Google

0: サービスのデフォルト/nginx()
- エラー: サービスにはエンドポイントの準備ができていません。ポッド: [Pod/nginx-67fd4f78fc-r4l4q]、予期される 1
エラー: 「nginx」という名前のサービスには、準備ができたエンドポイントがありません。これは、トラフィックを処理する準備ができているポッドがないことを意味します。
解決：
1. サービスの一部であるポッドのステータスを確認します。
2. いずれかのポッドの準備ができていない場合は、それらを再起動してみてください。
3. ポッドの準備がまだ整っていない場合は、ログをチェックしてエラーがあるかどうかを確認します。
4. エラーがある場合は、エラーを修正してからポッドを再起動してください。

1: horizontalPodAutoscaler のデフォルト/nginx()
- エラー: デプロイメント /nginx にはリソースが構成されていません。
エラー: デプロイメント「nginx」にリソース構成がありません。
解決：
1. デプロイメントマニフェストを編集し、必要なリソース構成を追加します。
2. デプロイメントを再デプロイします。

2: ポッドのデフォルト/nginx-67fd4f78fc-r4l4q(デプロイメント/nginx)
- エラー: 最後の終了理由は Completed ですcontainer=nginx pod=nginx-67fd4f78fc-r4l4q
エラー: nginx-67fd4f78fc-r4l4q ポッドの nginx コンテナーが完了し、終了しました。

解決：
1. nginx コンテナのログを確認して、完了した理由を特定します。
2. コンテナが正常に完了した場合、それ以上のアクションは必要ありません。
3. コンテナがエラーにより完了した場合は、エラーを調査し、それを解決するために適切なアクションを実行します。

Log の分析は一旦除外しました。
現在 Pod は livenessProbe の設定によって再起動を繰り返しています。

どちらも「Pod を確認して」程度の大雑把な解決策を提示しています。
ほとんど情報がないので仕方がないですね。
Log について分析をしてみます。

OpenAI の回答（Log のみ）

AI Provider: openai

0: Pod default/nginx-67fd4f78fc-hqz7f/nginx(Deployment/nginx)
- Error: 2024/08/15 02:39:51 [error] 29#29: *1 open() "/usr/share/nginx/html/wrong.html" failed (2: No such file or directory), client: 10.88.0.1, server: localhost, request: "GET /wrong.html HTTP/1.1", host: "10.88.0.10:80"
Error: The requested file "wrong.html" does not exist in the specified directory.

Solution:
1. Check if the file "wrong.html" exists in the directory "/usr/share/nginx/html".
2. If the file is missing, create or move the file to the correct directory.
3. Restart the nginx server to apply the changes.

Google 翻訳

AIプロバイダー：openai

0: ポッドのデフォルト/nginx-67fd4f78fc-hqz7f/nginx(デプロイメント/nginx)
- エラー: 2024/08/15 02:39:51 [エラー] 29#29: *1 open() "/usr/share/nginx/html/wrong.html" が失敗しました (2: そのようなファイルまたはディレクトリはありません)、クライアント: 10.88.0.1、サーバー: localhost、リクエスト: "GET /wrong.html HTTP/1.1"、ホスト: "10.88.0.10:80"
エラー: 要求されたファイル "wrong.html" は、指定されたディレクトリに存在しません。

解決：
1. ファイル「wrong.html」がディレクトリ「/usr/share/nginx/html」に存在するかどうかを確認します。
2. ファイルが見つからない場合は、ファイルを作成するか、正しいディレクトリに移動します。
3. nginx サーバーを再起動して、変更を適用します。

Gemini の回答（Log のみ）

AI Provider: google

0: Pod default/nginx-67fd4f78fc-r4l4q/nginx(Deployment/nginx)
- Error: 2024/08/14 09:55:18 [error] 29#29: *1 open() "/usr/share/nginx/html/wrong.html" failed (2: No such file or directory), client: 10.32.0.1, server: localhost, request: "GET /wrong.html HTTP/1.1", host: "10.32.0.42:80"
Error: Nginx cannot find the file "wrong.html" in the specified directory.
Solution:
1. Check if the file exists in the directory "/usr/share/nginx/html/".
2. If the file does not exist, create it.
3. Restart Nginx to apply the changes.

Google 翻訳

AIプロバイダー: Google

0: ポッドのデフォルト/nginx-67fd4f78fc-r4l4q/nginx(デプロイメント/nginx)
- エラー: 2024/08/14 09:55:18 [エラー] 29#29: *1 open() "/usr/share/nginx/html/wrong.html" が失敗しました (2: そのようなファイルまたはディレクトリはありません)、クライアント: 10.32.0.1、サーバー: localhost、リクエスト: "GET /wrong.html HTTP/1.1"、ホスト: "10.32.0.42:80"
エラー: Nginx は、指定されたディレクトリにファイル "wrong.html" を見つけることができません。
解決：
1. 「/usr/share/nginx/html/」ディレクトリにファイルが存在するか確認します。
2. ファイルが存在しない場合は、作成します。
3. Nginx を再起動して、変更を適用します。

どちらも期待通りの回答が来ました。素晴らしいです。

Deployment を以下のように修正しました。
HPA に対する検出結果もここで一緒に修正しています。

---
         livenessProbe:
           failureThreshold: 3
           httpGet:
-            path: wrong.html
+            path: index.html
             port: 80
             scheme: HTTP
           periodSeconds: 5
...
           timeoutSeconds: 1
         name: nginx
         resources:
+          limits:
+            cpu: 100m
           requests:
             cpu: 50m
         terminationMessagePath: /dev/termination-log

修正したものをデプロイ後、問題が検出されなくなりました。

$ k8sgpt analyze
AI Provider: AI not used; --explain not set

No problems detected

ワークロードも正常に動いています。
Service のエンドポイントもあります。

$ kubectl get deploy,svc,hpa
NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/nginx   1/1     1            1           4h30m

NAME                 TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)        AGE
service/kubernetes   ClusterIP   34.118.224.1     <none>        443/TCP        7h6m
service/nginx        NodePort    34.118.231.224   <none>        80:30284/TCP   4h29m

NAME                                        REFERENCE          TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
horizontalpodautoscaler.autoscaling/nginx   Deployment/nginx   2%/50%    1         5         1          4h30m

$ kubectl describe svc nginx | grep ^Endpoints
Endpoints:                10.32.0.43:80

クリーンアップ

ワークロードの削除

kubectl delete hpa nginx
kubectl delete svc nginx
kubectl delete deploy nginx

GKE クラスタの削除

gcloud container clusters delete k8sgpt-test \
    --location asia-northeast1-b \
    --project $PROJECT

AI バックエンドの削除

k8sgpt auth remove --backends openai
k8sgpt auth remove --backends google

生成した AI プロバイダの API キーも不要であれば削除しましょう。

感想

今回 OpenAI と Gemini を使って K8sGPT でクラスタ分析をしました。

総評としては、OpenAI はなぜか表示が見づらかったですが、回答の幅が広いなと思いました。
Gemini も回答は間違ってはいないですが、一つの可能性しか見られていなくてトラブルシューティングにおいてはミスリードな言い方に感じました。マニフェストを提示してくれた点には感動しました。

本ブログでも何度かお伝えしていますが、機密情報を AI プロバイダに送信しないよう気をつけてください。
K8sGPT の README を読んでもかなり注意喚起されています。

K8sGPT は対象クラスタの kubeconfig を持っていれば誰でも実行できてしまうので、管理者側としては使用制限をするのが難しいなと思いました。 IT リテラシーを持って活用しましょう。

Oshima

grasys の日常🎮ゲーム大会をしました。

MLOps で使える Vertex AI の評価指標ってどう解釈すればいいの？

Cookie	期間	説明
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

K8sGPT で Gemini が使えるようになっていたので OpenAI の結果と比較してみた

K8sGPT とは

k8sgpt コマンドをインストールする

CLI

Operator

基本コマンドの確認

※※※※※※情報の秘匿化についての注意点※※※※※※

AI バックエンドの認証

OpenAI を使う場合

Gemini を使う場合

クラスタ分析

準備

実行

クラスタ分析結果を比較する

1回目のスキャン

2回目のスキャン

3回目のスキャン

クリーンアップ

感想

Oshima

＼ 合わせて読みたい ／Related article

Kubernetesやってみた：Kubernetes 1.14以降の Kustomize 機能 実践

Kubernetes の Pod のライフサイクルを理解しよう！初心者向け解説と GKE での検証

「Spot pod for GKE Autopilot」の終了時の振る舞いについて調べてみた

kube-prometheus-stackのダッシュボードをGoogle Cloud Managed Service for Prometheusで使用する

Kubernetesやってみた：kustomize build を導入してサンプルを動かしてみよう

＼合わせて読みたい／Related article

Kubernetesやってみた：Kubernetes 1.14以降の Kustomize 機能実践