TheKoguryo's 기술 블로그

 Version 2025-11-14

3.3 Autoscaling - Karpenter 사용하기

Cluster Autoscaler를 사용하여 OKE 노드스케일할 수 있습니다. 쿠버네티스 노드스케일을 위해 많이 사용되는 툴로 karpenter가 있습니다. karpenter-oci는 OCI를 위한 karpenter 구현체로, OKE 클러스터를 지원합니다. karpenter-oci는 zoom 오픈소스로이며, OKE Self-Managed Nodes를 이용하여, 노드스케일합니다.

OKE 클러스터 준비하기

Self-Managed Nodes의 요구사항에 맞게 OKE 클러스터를 생성합니다.

  • OKE 버전: 1.32.1

  • CNI

    • 본 테스트에서는 Flannel CNI로 설치
    • VCN-Native Pod Networking CNI 플러그인은 1.27.10 부터 지원함
  • Enhanced Cluster

  • Node pool

    • Karpenter를 배포해서 기본 노드가 필요합니다.
    • 2개 이상 노드로 생성
OCI IAM Dynamic Group 및 Policy 만들기
  1. Workload Identity를 위한 Dynamic Group을 만듭니다

    • Name: 예, oke-workload-type-dyn-grp

      ALL { resource.type='workload' }
      
  2. Self-Managed 노드로 추가할 Compute 인스턴스를 포함하는 Dynamic Group을 만듭니다.

    • Name: 예, oke-self-managed-node-dyn-grp

      ALL {instance.compartment.id = '<compartment-ocid>'}
      
  3. 생성한 Dynamic Group을 위한 Policy를 만듭니다.

    • Name: 예, oke-karpenter-policy

      • <compartment-name>는 원하는 값으로 변경
      Allow dynamic-group oke-workload-type-dyn-grp to manage instance-family in compartment <compartment-name> where all {request.principal.namespace = 'karpenter',request.principal.service_account = 'karpenter'}
      Allow dynamic-group oke-workload-type-dyn-grp to manage instances in compartment <compartment-name> where all {request.principal.namespace = 'karpenter',request.principal.service_account = 'karpenter'}
      Allow dynamic-group oke-workload-type-dyn-grp to read instance-images in compartment <compartment-name> where all {request.principal.namespace = 'karpenter',request.principal.service_account = 'karpenter'}
      Allow dynamic-group oke-workload-type-dyn-grp to read app-catalog-listing in compartment <compartment-name> where all {request.principal.namespace = 'karpenter',request.principal.service_account = 'karpenter'}
      Allow dynamic-group oke-workload-type-dyn-grp to manage volume-family in compartment <compartment-name> where all {request.principal.namespace = 'karpenter',request.principal.service_account = 'karpenter'}
      Allow dynamic-group oke-workload-type-dyn-grp to manage volume-attachments in compartment <compartment-name> where all {request.principal.namespace = 'karpenter',request.principal.service_account = 'karpenter'}
      Allow dynamic-group oke-workload-type-dyn-grp to use volumes in compartment <compartment-name> where all {request.principal.namespace = 'karpenter',request.principal.service_account = 'karpenter'}
      Allow dynamic-group oke-workload-type-dyn-grp to use virtual-network-family in compartment <compartment-name> where all {request.principal.namespace = 'karpenter',request.principal.service_account = 'karpenter'}
      Allow dynamic-group oke-workload-type-dyn-grp to inspect vcns in compartment <compartment-name> where all {request.principal.namespace = 'karpenter',request.principal.service_account = 'karpenter'}
      Allow dynamic-group oke-workload-type-dyn-grp to use subnets in compartment <compartment-name> where all {request.principal.namespace = 'karpenter',request.principal.service_account = 'karpenter'}
      Allow dynamic-group oke-workload-type-dyn-grp to use network-security-groups in compartment <compartment-name> where all {request.principal.namespace = 'karpenter',request.principal.service_account = 'karpenter'}
      Allow dynamic-group oke-workload-type-dyn-grp to use vnics in compartment <compartment-name> where all {request.principal.namespace = 'karpenter',request.principal.service_account = 'karpenter'}
      Allow dynamic-group oke-workload-type-dyn-grp to use tag-namespaces in compartment <compartment-name> where all {request.principal.namespace = 'karpenter',request.principal.service_account = 'karpenter'}
      
    • Name: 예, oke-self-managed-node-policy

      • <compartment-name>는 원하는 값으로 변경
      Allow dynamic-group oke-self-managed-node-dyn-grp to {CLUSTER_JOIN} in compartment <compartment-name>
      
Tag Namespace 만들기
  1. OCI Console - Tag namespace 화면으로 이동합니다. Home Region으로 이동합니다.

  2. 새 tag namespace를 만듭니다.

    1. Name: oke-karpenter-ns-1
    2. Description: oke-karpenter-ns for karpenter-oci
  3. 생성한 네임스페이스내에 tag key를 생성합니다.

    key description
    karpenter_k8s_oracle/ocinodeclass the name of nodeclass used to create instance
    karpenter_sh/managed-by the OKE cluster name
    karpenter_sh/nodepool the name of nodepool used to create instance
    karpenter_sh/nodeclaim the name of nodeclaim used to create instance
OKE Custom 노드 이미지 만들기
  1. 아래 링크를 클릭하여, 사용할 Image의 OCID를 확인합니다.

  2. Image OCID를 사용해 Compute 인스턴스를 생성합니다.

    1. 예시
      1. Name: Oracle-Linux-8.10-2025.07.21-0-OKE-1.32.1-967
      2. Image OCID: ocid1.image.oc1.ap-tokyo-1.aaaaaaaawv22j3enzkxj6pkgdessrq3s26sz5phkf6ayrngptnoapmwrlxnq
  3. 새 OKE 노드 이미지에 포함시킬 컨테이너 이미지를 현재 OKE 클러스터에 Karpenter를 위한 2노드를 기준으로 기본 필요한 컨네이너 이미지를 확인합니다. 별도로 추가하고 싶은 이미지가 있는 경우, 따로 이미지 주소를 확인합니다.

    kubectl get pods --all-namespaces -o jsonpath="{.items[*].spec['initContainers', 'containers'][*].image}" |\
    tr -s '[[:space:]]' '\n' |\
    sort |\
    uniq -c
    
  4. 생성한 인스턴스에 접속하여, 포함시킬 이미지들을 다운로드 받습니다. 아래는 예시입니다.

    sudo su
    systemctl start crio
    
    crictl pull ap-tokyo-1.ocir.io/axoxdievda5j/oke-public-cloud-provider-oci:v1.32-2c5fcd2e853-46-csi@sha256:fb9e892af78589a74bf8a85fa47af4de66cb97a5fbe33846a5e4380f97c024ec
    crictl pull ap-tokyo-1.ocir.io/axoxdievda5j/oke-public-proxymux-cli:13757ae5fa6989143755b4cc16867594bb5a88e9-135@sha256:5a6126751984df52b0d2a163e435de54b1b3cfefe3f87a2277e4a9344cb75c9d
    crictl pull ap-tokyo-1.ocir.io/id9y6mi8tcky/oke-public-cluster-autoscaler@sha256:03f592a6ada29dcb2f06b2a9ea5e7d8d425e630a5d9311b0ddf9b0d7ca187800
    crictl pull ap-tokyo-1.ocir.io/id9y6mi8tcky/oke-public-cluster-proportional-autoscaler-amd64@sha256:1908914e0c9055edd754a633de2a37fd6811a64565317f2f44bf4adea85f0654
    crictl pull ap-tokyo-1.ocir.io/id9y6mi8tcky/oke-public-coredns@sha256:e32e8482ef16dbfd86896ece95e81111d5cb110811a65c3ce85df0ce2b69ca17
    crictl pull ap-tokyo-1.ocir.io/id9y6mi8tcky/oke-public-flannel@sha256:1d40a538acef1404c92dca16e297265eeedbb517e69c91b1b66d0e5f0a2d0805
    crictl pull ap-tokyo-1.ocir.io/id9y6mi8tcky/oke-public-kube-proxy@sha256:bd652187ddd6b7ab04d1f6b6bc52c0b456f3902763e16cfc55c0e601af9b8db2
    
  5. 필요하면 OS 패키지 업데이트도 진행합니다.

    sudo yum update
    
  6. 서버를 종료합니다.

    sudo shutdown now
    
  7. 작업한 VM로 커스텀 이미지를 생성합니다.

    • Name: 예, Oracle-Linux-8.10-2025.07.21-0-OKE-1.32.1-Custom-With-Images
Karpenter 설치하기
  1. Helm Git Repo를 추가합니다.

    helm repo add karpenter-oci https://zoom.github.io/karpenter-oci
    helm repo update
    
  2. karpenter 차트를 설치합니다.

    • clusterName, clusterEndpoint, clusterDns, compartmentId, ociResourcePrincipalRegion은 각자 환경에 맞게 변경합니다.

      helm install karpenter karpenter-oci/karpenter --version 1.4.2 \
      --namespace "karpenter" --create-namespace \
      --set "settings.clusterName=oke-cluster-karpenter" \
      --set "settings.clusterEndpoint=https://10.0.0.12:6443" \
      --set "settings.clusterDns=10.96.5.5" \
      --set "settings.compartmentId=ocid1.compartment.oc1..aaaaa..." \
      --set "settings.ociResourcePrincipalRegion=ap-tokyo-1" \
      --set "settings.tagNamespace=oke-karpenter-ns-1" \
      --set "settings.batchMaxDuration=10s" \
      --set "settings.batchIdleDuration=1s"
      
    • clusterEndpint: OKE 클러스터 상세정보에서 Kubernetes API private IPv4 endpoint

    • clusterDns: 다음 명령으로 확인합니다.

      kubectl get svc -n kube-system kube-dns
      NAME       TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE
      kube-dns   ClusterIP   10.96.5.5    <none>        53/UDP,53/TCP,9153/TCP   89m
      
    • batchMaxDuration: 연속된 배치시 지정된 시간만큼의 단위로 처리합니다.

    • batchIdleDuration: 배치후 지정된 시간만큼 다음 배치가 없으면, batchMaxDuration에 도달하지 않아도 바로 처리합니다.

Node Pool 만들기

NodePool은 Karpenter가 생성할 수 있는 노드와 해당 노드에서 실행될 수 있는 Pod에 제약 조건을 설정합니다. NodePool은 다음과 같은 작업을 수행하도록 설정할 수 있습니다.

  1. NodeClass 설정

    생성될 노드를 정의합니다. 여기서는 Flannel 기준으로 작성합니다.

    • Flannel용 예시 - oke_ocinodeclasses_sample.yaml

    • Native Pod Networking 예시 - oke_ocinodeclasses_native_cni_sample.yaml

    • 테스트 예제 - karpenter-nodeclass.yaml

      apiVersion: karpenter.k8s.oracle/v1alpha1
      kind: OciNodeClass
      metadata:
        name: default-karpenter-nodeclass
      spec:
        bootConfig:
          bootVolumeSizeInGBs: 100
          bootVolumeVpusPerGB: 10
        imageSelector:
          - name: Oracle-Linux-8.10-2025.07.21-0-OKE-1.32.1-Custom-With-Images
            compartmentId: ocid1.compartment.oc1..aaaaa.....
        imageFamily: OracleOKELinux
        kubelet:
          evictionHard:
            imagefs.available: 15%
            imagefs.inodesFree: 10%
            memory.available: 750Mi
            nodefs.available: 10%
            nodefs.inodesFree: 5%
          systemReserved:
            memory: 100Mi
        subnetSelector:
          - name: oke-nodesubnet-quick-oke-cluster-karpenter-.....-regional
        vcnId: ocid1.vcn.oc1.ap-tokyo-1.amaaa.....
        securityGroupSelector:
          - name: test-security-group
      
      • spec.imageSelector.name: 앞서 생성한 커스텀 이미지 이름
      • spec.imageSelector.compartmentId: 앞서 생성한 커스텀 이미지가 속한 comparment id를 입력
      • subnetSelector.name: 생성될 노드가 속할 서브넷 이름을 입력
      • vcnId: 생성될 노드가 속할 서브넷이 위치한 VCN의 id 입력
      • securityGroupSelector.name: 생성될 노드가 속할 NSG 이름 입력, 문서와 달리 필수 항목으로 없을 시 오류남
  2. NodePool 설정

    생성될 노드를 정의합니다.

    • 테스트 예제 - karpenter-nodepool.yaml

      apiVersion: karpenter.sh/v1
      kind: NodePool
      metadata:
        name: oke-cluster-karpenter-nodepool
      spec:
        disruption:
          budgets:
            - nodes: 100%
          consolidateAfter: 1m0s
          consolidationPolicy: WhenEmptyOrUnderutilized
        limits:
          cpu: 160
          memory: 1280Gi
        template:
          spec:
            expireAfter: Never
            nodeClassRef:
              group: karpenter.k8s.oracle
              kind: OciNodeClass
              name: default-karpenter-nodeclass
            requirements:
              - key: karpenter.sh/capacity-type
                operator: In
                values:
                  - on-demand
              - key: karpenter.k8s.oracle/instance-shape-name
                operator: In
                values:
                  - VM.Standard.E4.Flex
              - key: karpenter.k8s.oracle/instance-cpu
                operator: In
                values:
                  - '8'
              - key: karpenter.k8s.oracle/instance-memory
                operator: In
                values:
                  - '65536'
              - key: kubernetes.io/os
                operator: In
                values:
                  - linux
            terminationGracePeriod: 5m
      
      • metadata.name: 이 이름을 기준으로 oke-cluster-karpenter-nodepool-xxxxx 와 같이 노드 인스턴스가 생성됨
      • spec.disruption.consolidateAfter: Pod가 add 또는 delete 된후 지정한 시점 이후에 정리작업을 수행함. 노드 Scale In을 위한 대기 시간에 영향을 줌
      • key: karpenter.k8s.oracle/instance-shape-name: 사용할 OCI Shape 지정
      • key: karpenter.k8s.oracle/instance-cpu: vCPU 기준입니다. 8vCPU=4OCPU, 필요하면 여러 값을 추가할 수 있습니다.
      • key: karpenter.k8s.oracle/instance-memory: GB 단위로 입력, 필요하면 여러 값을 추가할 수 있습니다.
  3. NodeClass 배포합니다. 배포후 describe 명령으로 이미지 등을 포함한 각 항목이 오류가 없는 지 확인합니다.

    $ kubectl apply -f karpenter-nodeclass.yaml
    ocinodeclass.karpenter.k8s.oracle/default-karpenter-nodeclass created
    $ kubectl get ocinodeclass
    NAME                          AGE
    default-karpenter-nodeclass   7s
    $ kubectl describe ocinodeclass default-karpenter-nodeclass
    Name:         default-karpenter-nodeclass
    ...
    Status:
      Conditions:
        Last Transition Time:  2025-11-03T07:03:14Z
        Message:
        Observed Generation:   1
        Reason:                ImageReady
        Status:                True
        Type:                  ImageReady
        Last Transition Time:  2025-11-03T07:03:14Z
        Message:
        Observed Generation:   1
        Reason:                SubnetsReady
        Status:                True
        Type:                  SubnetsReady
        Last Transition Time:  2025-11-03T07:03:14Z
        Message:
        Observed Generation:   1
        Reason:                SecurityGroupsReady
        Status:                True
        Type:                  SecurityGroupsReady
        Last Transition Time:  2025-11-03T07:03:14Z
        Message:
        Observed Generation:   1
        Reason:                Ready
        Status:                True
        Type:                  Ready
    ...
    
  4. NodePool을 배포합니다. 배포후 READY=True인지 확인합니다.

    $ kubectl apply -f karpenter-nodepool.yaml
    nodepool.karpenter.sh/oke-cluster-karpenter-nodepool created
    $ kubectl get nodepools.karpenter.sh
    NAME                             NODECLASS                     NODES   READY   AGE
    oke-cluster-karpenter-nodepool   default-karpenter-nodeclass   0       True    9s
    
Agones & 샘플 게임 서버 설치
  1. 이미지 준비

    oci artifacts container repository create --compartment-id $COMPARTMENT_ID --display-name sandbox/agones-images/release/agones-sdk --is-public TRUE
    oci artifacts container repository create --compartment-id $COMPARTMENT_ID --display-name sandbox/agones-images/release/agones-allocator --is-public TRUE
    oci artifacts container repository create --compartment-id $COMPARTMENT_ID --display-name sandbox/agones-images/release/agones-controller --is-public TRUE
    oci artifacts container repository create --compartment-id $COMPARTMENT_ID --display-name sandbox/agones-images/release/agones-extensions --is-public TRUE
    oci artifacts container repository create --compartment-id $COMPARTMENT_ID --display-name sandbox/agones-images/release/agones-ping --is-public TRUE
    oci artifacts container repository create --compartment-id $COMPARTMENT_ID --display-name sandbox/agones-images/examples/supertuxkart-example --is-public TRUE
    
    NAMESPACE=`oci os ns get --query data --raw-output`
    
    docker pull us-docker.pkg.dev/agones-images/release/agones-sdk:1.53.0
    docker tag us-docker.pkg.dev/agones-images/release/agones-sdk:1.53.0 ap-tokyo-1.ocir.io/$NAMESPACE/sandbox/agones-images/release/agones-sdk:1.53.0
    docker push nrt.ocir.io/$NAMESPACE/sandbox/agones-images/release/agones-sdk:1.53.0
    
    docker pull us-docker.pkg.dev/agones-images/release/agones-allocator:1.53.0
    docker tag us-docker.pkg.dev/agones-images/release/agones-allocator:1.53.0 ap-tokyo-1.ocir.io/$NAMESPACE/sandbox/agones-images/release/agones-allocator:1.53.0
    docker push nrt.ocir.io/$NAMESPACE/sandbox/agones-images/release/agones-allocator:1.53.0
    
    docker pull us-docker.pkg.dev/agones-images/release/agones-controller:1.53.0
    docker tag us-docker.pkg.dev/agones-images/release/agones-controller:1.53.0 ap-tokyo-1.ocir.io/$NAMESPACE/sandbox/agones-images/release/agones-controller:1.53.0
    docker push nrt.ocir.io/$NAMESPACE/sandbox/agones-images/release/agones-controller:1.53.0
    
    docker pull us-docker.pkg.dev/agones-images/release/agones-extensions:1.53.0
    docker tag us-docker.pkg.dev/agones-images/release/agones-extensions:1.53.0 ap-tokyo-1.ocir.io/$NAMESPACE/sandbox/agones-images/release/agones-extensions:1.53.0
    docker push nrt.ocir.io/$NAMESPACE/sandbox/agones-images/release/agones-extensions:1.53.0
    
    docker pull us-docker.pkg.dev/agones-images/release/agones-ping:1.53.0
    docker tag us-docker.pkg.dev/agones-images/release/agones-ping:1.53.0 ap-tokyo-1.ocir.io/$NAMESPACE/sandbox/agones-images/release/agones-ping:1.53.0
    docker push nrt.ocir.io/$NAMESPACE/sandbox/agones-images/release/agones-ping:1.53.0
    
    docker pull us-docker.pkg.dev/agones-images/examples/supertuxkart-example:0.19
    docker tag us-docker.pkg.dev/agones-images/examples/supertuxkart-example:0.19 ap-tokyo-1.ocir.io/$NAMESPACE/sandbox/agones-images/examples/supertuxkart-example:0.19
    docker push nrt.ocir.io/$NAMESPACE/sandbox/agones-images/examples/supertuxkart-example:0.19
    
    docker pull us-docker.pkg.dev/agones-images/examples/supertuxkart-example:0.19 
    docker tag us-docker.pkg.dev/agones-images/examples/supertuxkart-example:0.19 ap-tokyo-1.ocir.io/$NAMESPACE/sandbox/agones-images/release/agones-sdk:1.53.0
    docker push nrt.ocir.io/$NAMESPACE/sandbox/agones-images/examples/supertuxkart-example:0.19 
    
  2. Agone 설치

    helm repo add agones https://agones.dev/chart/stable
    helm repo update
    helm uninstall my-agones --namespace agones-system
    helm install my-agones --namespace agones-system \
    --create-namespace agones/agones \
    --set agones.image.registry=ap-tokyo-1.ocir.io/$NAMESPACE/sandbox/agones-images/release \
    --set agones.ping.udp.expose=false
    
  3. 게임서버 설치

    wget https://agones.dev/site/docs/examples/supertuxkart/
    wget https://raw.githubusercontent.com/googleforgames/agones/release-1.53.0/examples/supertuxkart/fleet.yaml
    
  4. fleet.yaml 수정

    ...
              containers:
                - name: supertuxkart
                  image: ap-tokyo-1.ocir.io/$NAMESPACE/sandbox/agones-images/examples/supertuxkart-example:0.19
                  resources:
                    limits:
                      cpu: 1000m
                      memory: 4Gi
                    requests:
                      cpu: 1000m
                      memory: 4Gi
              nodeSelector:
                karpenter.sh/nodepool: oke-cluster-karpenter-nodepool                  
    
  5. 배포

    kubectl apply -f fleet.yaml
    
테스트
  1. 현재 상태 확인

    $ kubectl get nodes
    NAME          STATUS   ROLES   AGE     VERSION
    10.0.10.247   Ready    node    5d11h   v1.32.1
    10.0.10.71    Ready    node    5d11h   v1.32.1
    $ kubectl get fleet
    No resources found in default namespace.
    
  2. 배포하기

    kubectl apply -f fleet.yaml
    
  3. 스케일 하기 - Node Provisioning

    $ kubectl get nodes
    NAME          STATUS   ROLES    AGE     VERSION
    10.0.10.247   Ready    node     5d11h   v1.32.1
    10.0.10.71    Ready    node     5d11h   v1.32.1
    10.0.10.80    Ready    <none>   88s     v1.32.1
    $ kubectl get fleet
    NAME           SCHEDULING   DESIRED   CURRENT   ALLOCATED   READY   AGE
    supertuxkart   Packed       2         2         0           2       25s
    $ kubectl get pod -o wide
    NAME                       READY   STATUS    RESTARTS   AGE   IP             NODE         NOMINATED NODE   READINESS GATES
    supertuxkart-5vkzk-5k7jb   2/2     Running   0          62s   10.244.1.130   10.0.10.80   <none>           <none>
    supertuxkart-5vkzk-5zrr9   2/2     Running   0          62s   10.244.1.131   10.0.10.80   <none>           <none>
    
    • 새 노드 10.0.10.80이 생성되고 GameServer 2개가 해당 노드에 배포되었습니다.
  4. 스케일 인 하기 - Node Consolidation

    kubectl scale fleet supertuxkart --replicas=0
    
  5. 노드, GameServer, Pod 수를 확인합니다.

    $ kubectl get fleet
    NAME           SCHEDULING   DESIRED   CURRENT   ALLOCATED   READY   AGE
    supertuxkart   Packed       0         0         0           0       7m15s
    $ kubectl get pod
    No resources found in default namespace.
    $ kubectl get nodes
    NAME          STATUS   ROLES   AGE     VERSION
    10.0.10.247   Ready    node    5d11h   v1.32.1
    10.0.10.71    Ready    node    5d11h   v1.32.1
    
주요 설정 항목
  1. Helm Chart 배포시 설정

    Key Description 디폴트
    batchMaxDuration The maximum length of a batch window.
    The longer this is, the more pods we can consider for provisioning
    at one time which usually results in fewer but larger nodes.
    10s
    batchIdleDuration The maximum amount of time with no new ending pods that
    if exceeded ends the current batching window. If pods arrive faster than this time,
    the batching window will be extended up to the maxDuration.
    If they arrive slower, the pods will be batched separately.
    1s
  2. karpenter NodePool 배포 설정

    Key Description 디폴트
    disruption.consolidateAfter ConsolidateAfter is the duration the controller will wait
    before attempting to terminate nodes that are underutilized
    0s
    disruption.budgets.nodes Nodes dictates the maximum number of NodeClaims
    owned by this NodePool that can be terminating at once
    10%


이 글은 개인으로서, 개인의 시간을 할애하여 작성된 글입니다. 글의 내용에 오류가 있을 수 있으며, 글 속의 의견은 개인적인 의견입니다.

Last updated on 8 Nov 2025