GKEでFluentクラスタを作っちゃおう！もちろんBigQueryにデータを叩き込むよVer

dokuma

2016.10.18

(n‘∀‘)η ﾔｧｰｯﾎｫｰ<２ヶ月ぶりの投稿だよ！grasysではプリチィ担当のどくまです♪

今回はGKEでFluentクラスタを作った時の作業をまとめるよ！これができればあなたもGKEマスター？ぐらいな濃いめの内容になっています。なので、初歩的なところははもろもろ省いちゃってますごめんなさい。

ゴール

ワンコマンドで起動するマネージドサービス並みのfluentクラスタを構築する。

ポイントは下記の通り。

公式FluentdのDockerImageをカスタムする
GKEクラスタ構築時にNodeに対してscopeを割り当てる
Kubernetes(以降k8s)のConfigMapを利用してFluentdのConfigを変更する
PodにDiskを割り当てる

Docker Imageをカスタムする

まずは、公式のFluentdイメージを取ってきます。おもむろに docker run 。

$ docker run fluent/fluentd:latest-onbuild
2016-10-18 08:19:02 +0000 [info]: reading config file path="/fluentd/etc/fluent.conf"
2016-10-18 08:19:02 +0000 [info]: starting fluentd-0.12.29
2016-10-18 08:19:02 +0000 [info]: gem 'fluentd' version '0.12.29'
2016-10-18 08:19:02 +0000 [info]: adding match in @mainstream pattern="docker.**" type="file"
2016-10-18 08:19:02 +0000 [info]: adding match in @mainstream pattern="**" type="file"
2016-10-18 08:19:02 +0000 [info]: adding filter pattern="**" type="stdout"
2016-10-18 08:19:02 +0000 [info]: adding source type="forward"
2016-10-18 08:19:02 +0000 [info]: using configuration file: <ROOT>
  <source>
    @type forward
    @id input1
    @label @mainstream
    port 24224
  </source>
  <filter **>
    @type stdout
  </filter>
  <label @mainstream>
    <match docker.**>
      @type file
      @id output_docker1
      path /fluentd/log/docker.*.log
      symlink_path /fluentd/log/docker.log
      append true
      time_slice_format %Y%m%d
      time_slice_wait 1m
      time_format %Y%m%dT%H%M%S%z
      buffer_path /fluentd/log/docker.*.log
    </match>
    <match **>
      @type file
      @id output1
      path /fluentd/log/data.*.log
      symlink_path /fluentd/log/data.log
      append true
      time_slice_format %Y%m%d
      time_slice_wait 10m
      time_format %Y%m%dT%H%M%S%z
      buffer_path /fluentd/log/data.*.log
    </match>
  </label>
</ROOT>
2016-10-18 08:19:02 +0000 [info]: listening fluent socket on 0.0.0.0:24224

うん、動く！当たり前だけど。このImageをベースに、いろいろ実現していくよ。

公式のImageはAlpine Linuxで構成されていて、不要なものは一切入っていない。おもむろに中身を見てみよう。

$ docker run -it fluent/fluentd:latest-onbuild "/bin/sh"
~ $ cat /etc/os-release
NAME="Alpine Linux"
ID=alpine
VERSION_ID=3.4.0
PRETTY_NAME="Alpine Linux v3.4"
HOME_URL="http://alpinelinux.org"
BUG_REPORT_URL="http://bugs.alpinelinux.org"

/fluentd がfluentdのディレクトリのようだ。

~ $ ls -l /
total 56
drwxr-xr-x    2 root     root          4096 Jun 23 19:37 bin
drwxr-xr-x    5 root     root           380 Oct 18 09:34 dev
drwxr-xr-x   18 root     root          4096 Oct 18 09:34 etc
drwxr-xr-x    9 fluent   fluent        4096 Sep  7 09:33 fluentd
drwxr-xr-x    5 root     root          4096 Oct 18 09:34 home
drwxr-xr-x    6 root     root          4096 Sep  7 09:32 lib
lrwxrwxrwx    1 root     root            12 Jun 23 19:37 linuxrc -> /bin/busybox
drwxr-xr-x    5 root     root          4096 Jun 23 19:37 media
drwxr-xr-x    2 root     root          4096 Jun 23 19:37 mnt
dr-xr-xr-x   90 root     root             0 Oct 18 09:34 proc
drwx------    3 root     root          4096 Sep  7 09:32 root
drwxr-xr-x    2 root     root          4096 Jun 23 19:37 run
drwxr-xr-x    2 root     root          4096 Sep  7 09:32 sbin
drwxr-xr-x    2 root     root          4096 Jun 23 19:37 srv
dr-xr-xr-x   12 root     root             0 Oct 18 09:34 sys
drwxrwxrwt    2 root     root          4096 Jun 23 19:37 tmp
drwxr-xr-x   12 root     root          4096 Sep  7 09:33 usr
drwxr-xr-x   12 root     root          4096 Jun 23 19:37 var
~ $ ls -l /fluentd/
total 12
drwxr-xr-x    2 fluent   fluent        4096 Sep  7 09:33 etc
drwxr-xr-x    2 fluent   fluent        4096 Sep  7 09:33 log
drwxr-xr-x    2 fluent   fluent        4096 Sep  7 09:33 plugins

さっきおもむろに起動した時のコンフィグは、このコンフィグですね。

~ $ cat /fluentd/etc/fluent.conf
<source>
  @type  forward
  @id    input1
  @label @mainstream
  port  24224
</source>

<filter **>
  @type stdout
</filter>

<label @mainstream>
  <match docker.**>
    @type file
    @id   output_docker1
    path         /fluentd/log/docker.*.log
    symlink_path /fluentd/log/docker.log
    append       true
    time_slice_format %Y%m%d
    time_slice_wait   1m
    time_format       %Y%m%dT%H%M%S%z
  </match>
  <match **>
    @type file
    @id   output1
    path         /fluentd/log/data.*.log
    symlink_path /fluentd/log/data.log
    append       true
    time_slice_format %Y%m%d
    time_slice_wait   10m
    time_format       %Y%m%dT%H%M%S%z
  </match>
</label>
~ $

ここまでで、次の要求が思い浮かぶ。

Requirements

コンフィグを差し替えたい。
BigQueryにデータを叩き込む想定なので、 fluent-plugin-bigquery は必須。

課題が出てきた。

Task

¹公式には、Docker起動時パラメタで制御できるドキュメントの記述が存在するが、DockerFileは汎用的に作り、コンフィグとは分離したい。
fluent-plugin-bigquery のrequirementであるbigdecimalライブラリが無いため、gem install が失敗する。

解決策

Solution

後述するk8sのConfigMapを利用する。
bigdecimalライブラリをインストールする環境にする。

この時点で必須な作業は2です。

Let’s play !

bigdecimalライブラリをビルド可能な環境を整える。今回は、ビルド環境のインストールなどを伴うため、docker build を使うと時間がかかりすぎちゃう＆イメージサイズが²肥大化しちゃうので、 docker commitを使います。

rootユーザで docker run してカスタム開始しましょう。

$ docker run --rm -it -u root fluent/fluentd:latest-onbuild "/bin/sh"

ビルド環境をインストール

# apk add --update alpine-sdk --no-cache
# apk add --update ruby-dev --no-cache

gemのPATHを /home/fluent 配下にまとめたい＆fluentd起動ユーザはfluentなのでfluentユーザになります。

# su - fluent

gemのPATHを指定

~$ PATH=/home/fluent/.gem/ruby/2.3.0/bin:$PATH

必要なgemをインストォおおおおル！

~$ gem install bigdecimal
~$ gem install fluent-plugin-forest
~$ gem install fluent-plugin-bigquery

確認

~$ gem list

*** LOCAL GEMS ***

activesupport (4.2.7.1)
addressable (2.4.0)
bigdecimal (1.2.7)
cool.io (1.4.5)
did_you_mean (1.0.0)
faraday (0.9.2)
fluent-mixin-config-placeholders (0.4.0)
fluent-mixin-plaintextformatter (0.2.6)
fluent-plugin-bigquery (0.3.0)
fluent-plugin-buffer-lightening (0.0.2)
fluent-plugin-forest (0.3.3)
fluentd (0.12.29)
google-api-client (0.9.18)
googleauth (0.5.1)
http_parser.rb (0.6.0)
httpclient (2.8.2.4)
hurley (0.2)
i18n (0.7.0)
json (2.0.2, 1.8.3)
jwt (1.5.6)
little-plugger (1.1.4)
logging (2.1.0)
ltsv (0.1.0)
memoist (0.15.0)
mime-types (3.1)
mime-types-data (3.2016.0521)
minitest (5.8.3)
msgpack (0.5.12)
multi_json (1.12.1)
multipart-post (2.0.0)
net-telnet (0.1.1)
oj (2.17.4)
os (0.9.6)
power_assert (0.2.6)
psych (2.0.17)
rake (10.4.2)
representable (2.3.0)
retriable (2.1.0)
sigdump (0.2.4)
signet (0.7.3)
string-scrub (0.0.5)
test-unit (3.1.5)
thread_safe (0.3.5)
tzinfo (1.2.2)
tzinfo-data (1.2016.6)
uber (0.0.15)
uuidtools (2.1.5)
yajl-ruby (1.2.1)

ビルド環境は不要なので削除します。Ctrl+Dでrootに戻り、削除！

# apk del alpine-sdk
# apk del ruby-dev-sdk

別のターミナルで、docker commit します。

$ docker ps
CONTAINER ID        IMAGE                           COMMAND             CREATED             STATUS              PORTS                 NAMES
65df9f914a3b        fluent/fluentd:latest-onbuild   "/bin/sh"           32 minutes ago      Up 32 minutes       5140/tcp, 24224/tcp   modest_yalow
$ docker commit 65df9f914a3b fluentd:forblog
sha256:3fbdc0eb48cbd3b70188e2c2ffa1ac22a059b957210b23f3f9c5ef790ed360ee
$ docker images
REPOSITORY                                  TAG                 IMAGE ID            CREATED             SIZE
fluentd                                     forblog             3fbdc0eb48cb        16 seconds ago      74.11 MB

カスタムイメージ完成！

このイメージをベースに docker build してGKEで利用するイメージを作成します。 Dockerfileは次の通り。

FROM fluentd:forblog
MAINTAINER grasys
USER root
RUN gem uninstall json --version 2.0.2
USER fluent
WORKDIR /home/fluent
ENV PATH /home/fluent/.gem/ruby/2.3.0/bin:$PATH
EXPOSE 24224
CMD fluentd -c /fluentd/etc/$FLUENTD_CONF -p /fluentd/plugins $FLUENTD_OPT

$ docker build -t gcr.io/<your project id>/fluentd:forblog .
Sending build context to Docker daemon  25.6 kB
Step 1 : FROM fluentd:forblog
 ---> 3fbdc0eb48cb
Step 2 : MAINTAINER grasys
 ---> Running in 28a457de2780
 ---> 8a9d1594806b
Removing intermediate container 28a457de2780
Step 3 : USER fluent
 ---> Running in 5c9658fc8c03
 ---> 9792ab88d7e5
Removing intermediate container 5c9658fc8c03
Step 4 : WORKDIR /home/fluent
 ---> Running in 2604d1743cff
 ---> 83f100824890
Removing intermediate container 2604d1743cff
Step 5 : ENV PATH /home/fluent/.gem/ruby/2.3.0/bin:$PATH
 ---> Running in 2c6675c33cab
 ---> 531f6e00d498
Removing intermediate container 2c6675c33cab
Step 6 : EXPOSE 24224
 ---> Running in 0477000ca406
 ---> c928b06d4480
Removing intermediate container 0477000ca406
Step 7 : CMD fluentd -c /fluentd/etc/$FLUENTD_CONF -p /fluentd/plugins $FLUENTD_OPT
 ---> Running in 0bc116385fb3
 ---> b93af116d6b8
Removing intermediate container 0bc116385fb3
Successfully built b93af116d6b8

では、イメージをGKEのリポジトリにPushします。

$ gcloud docker push gcr.io/<your project id>/fluentd:forblog

ここまでで、Dockerカスタムイメージ作成は完了です。やったね！

GKEクラスタ構築時にNodeに対してscopeを割り当てる

続いてGKEクラスタを作成します。

ここでのポイントは、Node(GCEインスタンス)にbigqueryのwrite権限を与え、その権限を利用するようにクラスタを構築することです。

まあ³ドキュメント通りです(´Д｀)

また、Nodeを複数ゾーンにまたがせるためのオプションも追加しています。この場合、 --num-nodes で指定するNode数はそれぞれのゾーンに対するNode数となります。次の例だと asia-east1-a , asia-east1-b に対して1Nodeずつ割り当てられ、合計2Nodeになります。

$ gcloud container clusters create fluentd-cluster001 \
--zone asia-east1-a --additional-zones asia-east1-b \
--num-nodes 1 \
--machine-type g1-small \
--scopes compute-rw,bigquery
Creating cluster fluentd-cluster001...done.
Created [https://container.googleapis.com/v1/projects/<your project id>/zones/asia-east1-a/clusters/fluentd-cluster001].
kubeconfig entry generated for fluentd-cluster001.
NAME                ZONE          MASTER_VERSION  MASTER_IP        MACHINE_TYPE  NODE_VERSION  NUM_NODES  STATUS
fluentd-cluster001  asia-east1-a  1.4.3           104.199.201.228  g1-small      1.4.3

k8sのConfigMapを利用してFluentdのConfigを変更する

作成したImageに任意のConfigを設定したいので、次のファイルを作成します。

fluent-config
schema-json

これらのファイルをConfigMapでk8s APIが利用可能になるよう設定します。

# create
$ kubectl create configmap analysis-base-config --from-file=fluent-config
$ kubectl create configmap analysis-base-schema --from-file=schema-json

# describre
$ kubectl describe configmaps analysis-base-config
$ kubectl describe configmaps analysis-base-schema

# get
$ kubectl get configmaps analysis-base-config -o yaml
$ kubectl get configmaps analysis-base-schema -o yaml

各ファイルの内容は下記の通りです。

fluent-confg

<source>
  @type forward
  port 24224
  bind 0.0.0.0
</source>

<match ab.log>
  @type forest
  subtype copy
  <template>
    <store>
      @type bigquery
      method insert

      auth_method compute_engine

      project <your project id>
      dataset <your dataset>

      flush_interval 1
      buffer_chunk_records_limit 1000
      buffer_queue_limit 1024
      num_threads 16

      auto_create_table true
      table test_%Y_%m

      time_format %s
      time_field time
      schema_path /etc/opt/schema.json
    </store>
  </template>
</match>

schema-json

[
  {"name": "time", "type": "integer", "mode": "required"},
  {"name": "message", "type": "string", "mode": "required"}
]

PodにDiskを割り当てる

fluentdのログを格納するためのDiskを用意します。まずは、GCE常にDiskを作成します。

$ gcloud compute disks create --size=10GB --zone=asia-east1-a log-pd
$ gcloud compute disks create --size=10GB --zone=asia-east1-b log-pd

k8s APIが利用するための PersistentVolume , PersistentVolumeClaim を構成する。

$ kubectl create -f kpi-analysis_pv.yaml
$ kubectl create -f kpi-analysis_pvc.yaml

kpi-analysis_pv.yaml

apiVersion: v1
kind: PersistentVolume
metadata:
  name: log-pd
  labels:
    app: fluentd
    envionment: dev
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Delete
  gcePersistentDisk:
    pdName: log-pd
    fsType: ext4

kpi-analysis_pvc.yaml

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: fluentd-log-pvc
  labels:
    app: fluentd
    envionment: dev
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi

これですべての下準備が整いました。

いざ

$ kubectl create -f kpi-analysis_deployments.yaml
$ kubectl create -f kpi-analysis_service.yaml

これでServiceのホスティングができました。

数十秒待つとServiceにExternalIPが割り当てられます。

$ kubectl get services
NAME           CLUSTER-IP       EXTERNAL-IP      PORT(S)     AGE
kpi-analysis   10.171.245.105   104.199.212.82   24224/TCP   1m
kubernetes     10.171.240.1     <none>           443/TCP     41m

kpi-analysis_deployments.yaml

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: kpi-analysis
spec:
  replicas: 2
  template:
    metadata:
      labels:
        app: fluentd
        envionment: dev
    spec:
      securityContext:
        fsGroup: 1000
      containers:
        - name: kpi-analysis
          image: gcr.io/<your project id>/fluentd:forblog
          env:
            - name: FLUENTD_CONF
              value: analysis-base.conf
            - name: FLUENTD_OPT
              value: "--log /fluentd/log/fluentd.log"
          ports:
            - containerPort: 24224
              name: fluentd
          volumeMounts:
            - mountPath: /fluentd/etc
              name: config
            - mountPath: /etc/opt
              name: schema
            - mountPath: /fluentd/log
              name: log
      volumes:
        - name: config
          configMap:
            name: analysis-base-config
            items:
              - key: fluent-config
                path: analysis-base.conf
        - name: schema
          configMap:
            name: analysis-base-schema
            items:
              - key: schema-json
                path: schema.json
        - name: log
          persistentVolumeClaim:
            claimName: fluentd-log-pvc

kpi-analysis_service.yaml

apiVersion: v1
kind: Service
metadata:
  name: kpi-analysis
spec:
  ports:
    - port: 24224
      targetPort: 24224
  selector:
    app: fluentd
  type: LoadBalancer
  loadBalancerSourceRanges:
    - 130.211.0.0/22

fluent-catしてみますか！

$ echo '{"message":"結構疲れたよ・・・"}' | fluent-cat -h 104.199.212.82 ab.log

このメッセージが、fluentdのConfigで指定した通りにBigQueryに叩き込まれていれば成功です。

$ bq query 'select time,message from <your dataset>.test_2016_10 limit 10;'
Waiting on bqjob_r7701580834343d06_00000157dbdc265e_1 ... (0s) Current status: DONE
+------------+-----------+
|    time    |  message  |
+------------+-----------+
| 1476862212 | 結構疲れたよ・・・ |
+------------+-----------+

(　´∀｀)bｸﾞｯ!

所感

マネージドサービス並みの手軽さで起動するのとは、だいぶイメージが違う感じに仕上がりました。。。ただ、Dockerfileとk8sのConfigに全てまとまっているので、コード化の恩恵は得られます。 fluentdのConfig変更や実際の運用トピックには触れていませんが、まあ大体ご想像の通りです。

fluent公式Image ↩︎
docker build だと、ビルド環境をアンインストールしても、Imageにビルド環境のレイヤが残ってサイズが大きいままになります。 ↩︎
gcloud container clusters create ↩︎

株式会社grasys(グラシス)は、技術が好きで一緒に夢中になれる仲間を募集しています。

grasysは、大規模・高負荷・高集積・高密度なシステムを多く扱っているITインフラの会社です。Google Cloud (GCP)、Amazon Web Services (AWS)、Microsoft Azureの最先端技術を活用してクラウドインフラやデータ分析基盤など、ITシステムの重要な基盤を設計・構築し、改善を続けながら運用しています。

お客様の課題解決をしながら技術を広げたい方、攻めのインフラ技術を習得したい方、とことん技術を追求したい方にとって素晴らしい環境が、grasysにはあります。
お気軽にご連絡ください。

株式会社grasys | 採用情報

dokuma

ＧＣＰ利用料と支払方法、ちょっとだけ経理処理のお話

ECCUBE3構築 Compute Engine

Cookie	期間	説明
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

GKEでFluentクラスタを作っちゃおう！もちろんBigQueryにデータを叩き込むよVer

ゴール

Docker Imageをカスタムする

Requirements

Task

Solution

Let’s play !

GKEクラスタ構築時にNodeに対してscopeを割り当てる

k8sのConfigMapを利用してFluentdのConfigを変更する

fluent-confg

schema-json

PodにDiskを割り当てる

kpi-analysis_pv.yaml

kpi-analysis_pvc.yaml

いざ

kpi-analysis_deployments.yaml

kpi-analysis_service.yaml

fluent-catしてみますか！

所感

dokuma

＼ 合わせて読みたい ／Related article

Open Match の Redis を Memorystore Redis に移行

OpenEBS Container Attached Storage

ターミナルがダサいとモテない。gemini-cliでLinuxのKernel Parameter Tuningをしてみる。

K8sGPT で Gemini が使えるようになっていたので OpenAI の結果と比較してみた

Kubernetesやってみた：Kubernetes 1.14以降の Kustomize 機能 実践

＼合わせて読みたい／Related article

Kubernetesやってみた：Kubernetes 1.14以降の Kustomize 機能実践