3.3 Autoscaling - Karpenter 사용하기
Cluster Autoscaler를 사용하여 OKE 노드스케일할 수 있습니다. 쿠버네티스 노드스케일을 위해 많이 사용되는 툴로 karpenter가 있습니다. karpenter-oci는 OCI를 위한 karpenter 구현체로, OKE 클러스터를 지원합니다. karpenter-oci는 zoom 오픈소스로이며, OKE Self-Managed Nodes를 이용하여, 노드스케일합니다.
OKE 클러스터 준비하기
Self-Managed Nodes의 요구사항에 맞게 OKE 클러스터를 생성합니다.
-
OKE 버전: 1.32.1
-
CNI
- 본 테스트에서는 Flannel CNI로 설치
- VCN-Native Pod Networking CNI 플러그인은 1.27.10 부터 지원함
-
Enhanced Cluster
-
Node pool
- Karpenter를 배포해서 기본 노드가 필요합니다.
- 2개 이상 노드로 생성
OCI IAM Dynamic Group 및 Policy 만들기
-
Workload Identity를 위한 Dynamic Group을 만듭니다
-
Name: 예,
oke-workload-type-dyn-grpALL { resource.type='workload' }
-
-
Self-Managed 노드로 추가할 Compute 인스턴스를 포함하는 Dynamic Group을 만듭니다.
-
Name: 예,
oke-self-managed-node-dyn-grpALL {instance.compartment.id = '<compartment-ocid>'}
-
-
생성한 Dynamic Group을 위한 Policy를 만듭니다.
-
Name: 예,
oke-karpenter-policy<compartment-name>는 원하는 값으로 변경
Allow dynamic-group oke-workload-type-dyn-grp to manage instance-family in compartment <compartment-name> where all {request.principal.namespace = 'karpenter',request.principal.service_account = 'karpenter'} Allow dynamic-group oke-workload-type-dyn-grp to manage instances in compartment <compartment-name> where all {request.principal.namespace = 'karpenter',request.principal.service_account = 'karpenter'} Allow dynamic-group oke-workload-type-dyn-grp to read instance-images in compartment <compartment-name> where all {request.principal.namespace = 'karpenter',request.principal.service_account = 'karpenter'} Allow dynamic-group oke-workload-type-dyn-grp to read app-catalog-listing in compartment <compartment-name> where all {request.principal.namespace = 'karpenter',request.principal.service_account = 'karpenter'} Allow dynamic-group oke-workload-type-dyn-grp to manage volume-family in compartment <compartment-name> where all {request.principal.namespace = 'karpenter',request.principal.service_account = 'karpenter'} Allow dynamic-group oke-workload-type-dyn-grp to manage volume-attachments in compartment <compartment-name> where all {request.principal.namespace = 'karpenter',request.principal.service_account = 'karpenter'} Allow dynamic-group oke-workload-type-dyn-grp to use volumes in compartment <compartment-name> where all {request.principal.namespace = 'karpenter',request.principal.service_account = 'karpenter'} Allow dynamic-group oke-workload-type-dyn-grp to use virtual-network-family in compartment <compartment-name> where all {request.principal.namespace = 'karpenter',request.principal.service_account = 'karpenter'} Allow dynamic-group oke-workload-type-dyn-grp to inspect vcns in compartment <compartment-name> where all {request.principal.namespace = 'karpenter',request.principal.service_account = 'karpenter'} Allow dynamic-group oke-workload-type-dyn-grp to use subnets in compartment <compartment-name> where all {request.principal.namespace = 'karpenter',request.principal.service_account = 'karpenter'} Allow dynamic-group oke-workload-type-dyn-grp to use network-security-groups in compartment <compartment-name> where all {request.principal.namespace = 'karpenter',request.principal.service_account = 'karpenter'} Allow dynamic-group oke-workload-type-dyn-grp to use vnics in compartment <compartment-name> where all {request.principal.namespace = 'karpenter',request.principal.service_account = 'karpenter'} Allow dynamic-group oke-workload-type-dyn-grp to use tag-namespaces in compartment <compartment-name> where all {request.principal.namespace = 'karpenter',request.principal.service_account = 'karpenter'} -
Name: 예,
oke-self-managed-node-policy<compartment-name>는 원하는 값으로 변경
Allow dynamic-group oke-self-managed-node-dyn-grp to {CLUSTER_JOIN} in compartment <compartment-name>
-
Tag Namespace 만들기
-
OCI Console - Tag namespace 화면으로 이동합니다. Home Region으로 이동합니다.
-
새 tag namespace를 만듭니다.
- Name:
oke-karpenter-ns-1 - Description:
oke-karpenter-ns for karpenter-oci
- Name:
-
생성한 네임스페이스내에 tag key를 생성합니다.
key description karpenter_k8s_oracle/ocinodeclass the name of nodeclass used to create instance karpenter_sh/managed-by the OKE cluster name karpenter_sh/nodepool the name of nodepool used to create instance karpenter_sh/nodeclaim the name of nodeclaim used to create instance
OKE Custom 노드 이미지 만들기
-
아래 링크를 클릭하여, 사용할 Image의 OCID를 확인합니다.
- 2023년 3월 28일 이후 릴리즈된 아래 이미지 중 사용
-
Image OCID를 사용해 Compute 인스턴스를 생성합니다.
- 예시
- Name:
Oracle-Linux-8.10-2025.07.21-0-OKE-1.32.1-967 - Image OCID:
ocid1.image.oc1.ap-tokyo-1.aaaaaaaawv22j3enzkxj6pkgdessrq3s26sz5phkf6ayrngptnoapmwrlxnq
- Name:
- 예시
-
새 OKE 노드 이미지에 포함시킬 컨테이너 이미지를 현재 OKE 클러스터에 Karpenter를 위한 2노드를 기준으로 기본 필요한 컨네이너 이미지를 확인합니다. 별도로 추가하고 싶은 이미지가 있는 경우, 따로 이미지 주소를 확인합니다.
kubectl get pods --all-namespaces -o jsonpath="{.items[*].spec['initContainers', 'containers'][*].image}" |\ tr -s '[[:space:]]' '\n' |\ sort |\ uniq -c -
생성한 인스턴스에 접속하여, 포함시킬 이미지들을 다운로드 받습니다. 아래는 예시입니다.
sudo su systemctl start crio crictl pull ap-tokyo-1.ocir.io/axoxdievda5j/oke-public-cloud-provider-oci:v1.32-2c5fcd2e853-46-csi@sha256:fb9e892af78589a74bf8a85fa47af4de66cb97a5fbe33846a5e4380f97c024ec crictl pull ap-tokyo-1.ocir.io/axoxdievda5j/oke-public-proxymux-cli:13757ae5fa6989143755b4cc16867594bb5a88e9-135@sha256:5a6126751984df52b0d2a163e435de54b1b3cfefe3f87a2277e4a9344cb75c9d crictl pull ap-tokyo-1.ocir.io/id9y6mi8tcky/oke-public-cluster-autoscaler@sha256:03f592a6ada29dcb2f06b2a9ea5e7d8d425e630a5d9311b0ddf9b0d7ca187800 crictl pull ap-tokyo-1.ocir.io/id9y6mi8tcky/oke-public-cluster-proportional-autoscaler-amd64@sha256:1908914e0c9055edd754a633de2a37fd6811a64565317f2f44bf4adea85f0654 crictl pull ap-tokyo-1.ocir.io/id9y6mi8tcky/oke-public-coredns@sha256:e32e8482ef16dbfd86896ece95e81111d5cb110811a65c3ce85df0ce2b69ca17 crictl pull ap-tokyo-1.ocir.io/id9y6mi8tcky/oke-public-flannel@sha256:1d40a538acef1404c92dca16e297265eeedbb517e69c91b1b66d0e5f0a2d0805 crictl pull ap-tokyo-1.ocir.io/id9y6mi8tcky/oke-public-kube-proxy@sha256:bd652187ddd6b7ab04d1f6b6bc52c0b456f3902763e16cfc55c0e601af9b8db2 -
필요하면 OS 패키지 업데이트도 진행합니다.
sudo yum update -
서버를 종료합니다.
sudo shutdown now -
작업한 VM로 커스텀 이미지를 생성합니다.
- Name: 예,
Oracle-Linux-8.10-2025.07.21-0-OKE-1.32.1-Custom-With-Images
- Name: 예,
Karpenter 설치하기
-
Helm Git Repo를 추가합니다.
helm repo add karpenter-oci https://zoom.github.io/karpenter-oci helm repo update -
karpenter 차트를 설치합니다.
-
clusterName,clusterEndpoint,clusterDns,compartmentId,ociResourcePrincipalRegion은 각자 환경에 맞게 변경합니다.helm install karpenter karpenter-oci/karpenter --version 1.4.2 \ --namespace "karpenter" --create-namespace \ --set "settings.clusterName=oke-cluster-karpenter" \ --set "settings.clusterEndpoint=https://10.0.0.12:6443" \ --set "settings.clusterDns=10.96.5.5" \ --set "settings.compartmentId=ocid1.compartment.oc1..aaaaa..." \ --set "settings.ociResourcePrincipalRegion=ap-tokyo-1" \ --set "settings.tagNamespace=oke-karpenter-ns-1" \ --set "settings.batchMaxDuration=10s" \ --set "settings.batchIdleDuration=1s" -
clusterEndpint: OKE 클러스터 상세정보에서
Kubernetes API private IPv4 endpoint -
clusterDns: 다음 명령으로 확인합니다.
kubectl get svc -n kube-system kube-dns NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kube-dns ClusterIP 10.96.5.5 <none> 53/UDP,53/TCP,9153/TCP 89m -
batchMaxDuration: 연속된 배치시 지정된 시간만큼의 단위로 처리합니다.
-
batchIdleDuration: 배치후 지정된 시간만큼 다음 배치가 없으면, batchMaxDuration에 도달하지 않아도 바로 처리합니다.
-
Node Pool 만들기
NodePool은 Karpenter가 생성할 수 있는 노드와 해당 노드에서 실행될 수 있는 Pod에 제약 조건을 설정합니다. NodePool은 다음과 같은 작업을 수행하도록 설정할 수 있습니다.
-
NodeClass 설정
생성될 노드를 정의합니다. 여기서는 Flannel 기준으로 작성합니다.
-
Flannel용 예시 - oke_ocinodeclasses_sample.yaml
-
Native Pod Networking 예시 - oke_ocinodeclasses_native_cni_sample.yaml
-
테스트 예제 -
karpenter-nodeclass.yamlapiVersion: karpenter.k8s.oracle/v1alpha1 kind: OciNodeClass metadata: name: default-karpenter-nodeclass spec: bootConfig: bootVolumeSizeInGBs: 100 bootVolumeVpusPerGB: 10 imageSelector: - name: Oracle-Linux-8.10-2025.07.21-0-OKE-1.32.1-Custom-With-Images compartmentId: ocid1.compartment.oc1..aaaaa..... imageFamily: OracleOKELinux kubelet: evictionHard: imagefs.available: 15% imagefs.inodesFree: 10% memory.available: 750Mi nodefs.available: 10% nodefs.inodesFree: 5% systemReserved: memory: 100Mi subnetSelector: - name: oke-nodesubnet-quick-oke-cluster-karpenter-.....-regional vcnId: ocid1.vcn.oc1.ap-tokyo-1.amaaa..... securityGroupSelector: - name: test-security-groupspec.imageSelector.name: 앞서 생성한 커스텀 이미지 이름spec.imageSelector.compartmentId: 앞서 생성한 커스텀 이미지가 속한 comparment id를 입력subnetSelector.name: 생성될 노드가 속할 서브넷 이름을 입력vcnId: 생성될 노드가 속할 서브넷이 위치한 VCN의 id 입력securityGroupSelector.name: 생성될 노드가 속할 NSG 이름 입력, 문서와 달리 필수 항목으로 없을 시 오류남
-
-
NodePool 설정
생성될 노드를 정의합니다.
-
테스트 예제 -
karpenter-nodepool.yamlapiVersion: karpenter.sh/v1 kind: NodePool metadata: name: oke-cluster-karpenter-nodepool spec: disruption: budgets: - nodes: 100% consolidateAfter: 1m0s consolidationPolicy: WhenEmptyOrUnderutilized limits: cpu: 160 memory: 1280Gi template: spec: expireAfter: Never nodeClassRef: group: karpenter.k8s.oracle kind: OciNodeClass name: default-karpenter-nodeclass requirements: - key: karpenter.sh/capacity-type operator: In values: - on-demand - key: karpenter.k8s.oracle/instance-shape-name operator: In values: - VM.Standard.E4.Flex - key: karpenter.k8s.oracle/instance-cpu operator: In values: - '8' - key: karpenter.k8s.oracle/instance-memory operator: In values: - '65536' - key: kubernetes.io/os operator: In values: - linux terminationGracePeriod: 5mmetadata.name: 이 이름을 기준으로 oke-cluster-karpenter-nodepool-xxxxx 와 같이 노드 인스턴스가 생성됨spec.disruption.consolidateAfter: Pod가 add 또는 delete 된후 지정한 시점 이후에 정리작업을 수행함. 노드 Scale In을 위한 대기 시간에 영향을 줌key: karpenter.k8s.oracle/instance-shape-name: 사용할 OCI Shape 지정key: karpenter.k8s.oracle/instance-cpu: vCPU 기준입니다. 8vCPU=4OCPU, 필요하면 여러 값을 추가할 수 있습니다.key: karpenter.k8s.oracle/instance-memory: GB 단위로 입력, 필요하면 여러 값을 추가할 수 있습니다.
-
-
NodeClass 배포합니다. 배포후 describe 명령으로 이미지 등을 포함한 각 항목이 오류가 없는 지 확인합니다.
$ kubectl apply -f karpenter-nodeclass.yaml ocinodeclass.karpenter.k8s.oracle/default-karpenter-nodeclass created $ kubectl get ocinodeclass NAME AGE default-karpenter-nodeclass 7s $ kubectl describe ocinodeclass default-karpenter-nodeclass Name: default-karpenter-nodeclass ... Status: Conditions: Last Transition Time: 2025-11-03T07:03:14Z Message: Observed Generation: 1 Reason: ImageReady Status: True Type: ImageReady Last Transition Time: 2025-11-03T07:03:14Z Message: Observed Generation: 1 Reason: SubnetsReady Status: True Type: SubnetsReady Last Transition Time: 2025-11-03T07:03:14Z Message: Observed Generation: 1 Reason: SecurityGroupsReady Status: True Type: SecurityGroupsReady Last Transition Time: 2025-11-03T07:03:14Z Message: Observed Generation: 1 Reason: Ready Status: True Type: Ready ... -
NodePool을 배포합니다. 배포후 READY=True인지 확인합니다.
$ kubectl apply -f karpenter-nodepool.yaml nodepool.karpenter.sh/oke-cluster-karpenter-nodepool created $ kubectl get nodepools.karpenter.sh NAME NODECLASS NODES READY AGE oke-cluster-karpenter-nodepool default-karpenter-nodeclass 0 True 9s
Agones & 샘플 게임 서버 설치
-
이미지 준비
oci artifacts container repository create --compartment-id $COMPARTMENT_ID --display-name sandbox/agones-images/release/agones-sdk --is-public TRUE oci artifacts container repository create --compartment-id $COMPARTMENT_ID --display-name sandbox/agones-images/release/agones-allocator --is-public TRUE oci artifacts container repository create --compartment-id $COMPARTMENT_ID --display-name sandbox/agones-images/release/agones-controller --is-public TRUE oci artifacts container repository create --compartment-id $COMPARTMENT_ID --display-name sandbox/agones-images/release/agones-extensions --is-public TRUE oci artifacts container repository create --compartment-id $COMPARTMENT_ID --display-name sandbox/agones-images/release/agones-ping --is-public TRUE oci artifacts container repository create --compartment-id $COMPARTMENT_ID --display-name sandbox/agones-images/examples/supertuxkart-example --is-public TRUE NAMESPACE=`oci os ns get --query data --raw-output` docker pull us-docker.pkg.dev/agones-images/release/agones-sdk:1.53.0 docker tag us-docker.pkg.dev/agones-images/release/agones-sdk:1.53.0 ap-tokyo-1.ocir.io/$NAMESPACE/sandbox/agones-images/release/agones-sdk:1.53.0 docker push nrt.ocir.io/$NAMESPACE/sandbox/agones-images/release/agones-sdk:1.53.0 docker pull us-docker.pkg.dev/agones-images/release/agones-allocator:1.53.0 docker tag us-docker.pkg.dev/agones-images/release/agones-allocator:1.53.0 ap-tokyo-1.ocir.io/$NAMESPACE/sandbox/agones-images/release/agones-allocator:1.53.0 docker push nrt.ocir.io/$NAMESPACE/sandbox/agones-images/release/agones-allocator:1.53.0 docker pull us-docker.pkg.dev/agones-images/release/agones-controller:1.53.0 docker tag us-docker.pkg.dev/agones-images/release/agones-controller:1.53.0 ap-tokyo-1.ocir.io/$NAMESPACE/sandbox/agones-images/release/agones-controller:1.53.0 docker push nrt.ocir.io/$NAMESPACE/sandbox/agones-images/release/agones-controller:1.53.0 docker pull us-docker.pkg.dev/agones-images/release/agones-extensions:1.53.0 docker tag us-docker.pkg.dev/agones-images/release/agones-extensions:1.53.0 ap-tokyo-1.ocir.io/$NAMESPACE/sandbox/agones-images/release/agones-extensions:1.53.0 docker push nrt.ocir.io/$NAMESPACE/sandbox/agones-images/release/agones-extensions:1.53.0 docker pull us-docker.pkg.dev/agones-images/release/agones-ping:1.53.0 docker tag us-docker.pkg.dev/agones-images/release/agones-ping:1.53.0 ap-tokyo-1.ocir.io/$NAMESPACE/sandbox/agones-images/release/agones-ping:1.53.0 docker push nrt.ocir.io/$NAMESPACE/sandbox/agones-images/release/agones-ping:1.53.0 docker pull us-docker.pkg.dev/agones-images/examples/supertuxkart-example:0.19 docker tag us-docker.pkg.dev/agones-images/examples/supertuxkart-example:0.19 ap-tokyo-1.ocir.io/$NAMESPACE/sandbox/agones-images/examples/supertuxkart-example:0.19 docker push nrt.ocir.io/$NAMESPACE/sandbox/agones-images/examples/supertuxkart-example:0.19 docker pull us-docker.pkg.dev/agones-images/examples/supertuxkart-example:0.19 docker tag us-docker.pkg.dev/agones-images/examples/supertuxkart-example:0.19 ap-tokyo-1.ocir.io/$NAMESPACE/sandbox/agones-images/release/agones-sdk:1.53.0 docker push nrt.ocir.io/$NAMESPACE/sandbox/agones-images/examples/supertuxkart-example:0.19 -
Agone 설치
helm repo add agones https://agones.dev/chart/stable helm repo update helm uninstall my-agones --namespace agones-system helm install my-agones --namespace agones-system \ --create-namespace agones/agones \ --set agones.image.registry=ap-tokyo-1.ocir.io/$NAMESPACE/sandbox/agones-images/release \ --set agones.ping.udp.expose=false -
게임서버 설치
wget https://agones.dev/site/docs/examples/supertuxkart/ wget https://raw.githubusercontent.com/googleforgames/agones/release-1.53.0/examples/supertuxkart/fleet.yaml -
fleet.yaml 수정
... containers: - name: supertuxkart image: ap-tokyo-1.ocir.io/$NAMESPACE/sandbox/agones-images/examples/supertuxkart-example:0.19 resources: limits: cpu: 1000m memory: 4Gi requests: cpu: 1000m memory: 4Gi nodeSelector: karpenter.sh/nodepool: oke-cluster-karpenter-nodepool -
배포
kubectl apply -f fleet.yaml
테스트
-
현재 상태 확인
$ kubectl get nodes NAME STATUS ROLES AGE VERSION 10.0.10.247 Ready node 5d11h v1.32.1 10.0.10.71 Ready node 5d11h v1.32.1 $ kubectl get fleet No resources found in default namespace. -
배포하기
kubectl apply -f fleet.yaml -
스케일 하기 - Node Provisioning
$ kubectl get nodes NAME STATUS ROLES AGE VERSION 10.0.10.247 Ready node 5d11h v1.32.1 10.0.10.71 Ready node 5d11h v1.32.1 10.0.10.80 Ready <none> 88s v1.32.1 $ kubectl get fleet NAME SCHEDULING DESIRED CURRENT ALLOCATED READY AGE supertuxkart Packed 2 2 0 2 25s $ kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES supertuxkart-5vkzk-5k7jb 2/2 Running 0 62s 10.244.1.130 10.0.10.80 <none> <none> supertuxkart-5vkzk-5zrr9 2/2 Running 0 62s 10.244.1.131 10.0.10.80 <none> <none>- 새 노드
10.0.10.80이 생성되고 GameServer 2개가 해당 노드에 배포되었습니다.
- 새 노드
-
스케일 인 하기 - Node Consolidation
kubectl scale fleet supertuxkart --replicas=0 -
노드, GameServer, Pod 수를 확인합니다.
$ kubectl get fleet NAME SCHEDULING DESIRED CURRENT ALLOCATED READY AGE supertuxkart Packed 0 0 0 0 7m15s $ kubectl get pod No resources found in default namespace. $ kubectl get nodes NAME STATUS ROLES AGE VERSION 10.0.10.247 Ready node 5d11h v1.32.1 10.0.10.71 Ready node 5d11h v1.32.1
주요 설정 항목
-
Helm Chart 배포시 설정
Key Description 디폴트 batchMaxDuration The maximum length of a batch window.
The longer this is, the more pods we can consider for provisioning
at one time which usually results in fewer but larger nodes.10s batchIdleDuration The maximum amount of time with no new ending pods that
if exceeded ends the current batching window. If pods arrive faster than this time,
the batching window will be extended up to the maxDuration.
If they arrive slower, the pods will be batched separately.1s -
karpenter NodePool 배포 설정
Key Description 디폴트 disruption.consolidateAfter ConsolidateAfter is the duration the controller will wait
before attempting to terminate nodes that are underutilized0s disruption.budgets.nodes Nodes dictates the maximum number of NodeClaims
owned by this NodePool that can be terminating at once10%
- 설정파라미터 참고
이 글은 개인으로서, 개인의 시간을 할애하여 작성된 글입니다. 글의 내용에 오류가 있을 수 있으며, 글 속의 의견은 개인적인 의견입니다.