您好,登錄后才能下訂單哦!
這篇文章給大家介紹如何使用Django和Prometheus以及Kubernetes定制應用指標,內容非常詳細,感興趣的小伙伴們可以參考借鑒,希望對大家能有所幫助。
這里強調了應用程序定制指標的重要性,用代碼實例演示了如何設計指標并整合Prometheus到Django項目中,為使用Django構建應用的開發者提供了參考。
盡管有大量關于這一主題的討論,但應用程序的自定義指標的重要性怎么強調都不為過。和為Django應用收集的核心服務指標(應用和web服務器統計數據、關鍵數據庫和緩存操作指標)不同,自定義指標是業務特有的數據點,其邊界和閾值只有你自己知道,這其實是很有趣的事情。
什么樣的指標才是有用的?考慮下面幾點:
運行一個電子商務網站并追蹤平均訂單數量。突然間訂單的數量不那么平均了。有了可靠的應用指標和監控,你就可以在損失殆盡之前捕獲到Bug。
你正在寫一個爬蟲,它每小時從一個新聞網站抓取最新的文章。突然最近的文章并不新了。可靠的指標和監控可以更早地揭示問題所在。
我認為你已經理解了重點。
除了明顯的依賴(pip install Django
)之外,我們還需要為寵物項目(譯者注:demo)添加一些額外的包。繼續并安裝pip install django-prometheus-client
。這將為我們提供一個Python的Prometheus客戶端,以及一些有用的Django hook,包括中間件和一個優雅的DB包裝器。接下來,我們將運行Django管理命令來啟動項目,更新我們的設置來使用Prometheus客戶端,并將Prometheus的URL添加到URL配置中。
啟動一個新的項目和應用程序
為了這篇文章,并且切合代理的品牌,我們建立了一個遛狗服務。請注意,它實際上不會做什么事,但足以作為一個教學示例。執行如下命令:
django-admin.py startproject demo python manage.py startapp walker
#settings.py INSTALLED_APPS = [ ... 'walker', ... ]
現在,我們來添加一些基本的模型和視圖。簡單起見,我只實現將要驗證的部分。如果想要完整地示例,可以從這個demo應用 獲取源碼。
# walker/models.py from django.db import models from django_prometheus.models import ExportModelOperationsMixin class Walker(ExportModelOperationsMixin('walker'), models.Model): name = models.CharField(max_length=127) email = models.CharField(max_length=127) def __str__(self): return f'{self.name} // {self.email} ({self.id})' class Dog(ExportModelOperationsMixin('dog'), models.Model): SIZE_XS = 'xs' SIZE_SM = 'sm' SIZE_MD = 'md' SIZE_LG = 'lg' SIZE_XL = 'xl' DOG_SIZES = ( (SIZE_XS, 'xsmall'), (SIZE_SM, 'small'), (SIZE_MD, 'medium'), (SIZE_LG, 'large'), (SIZE_XL, 'xlarge'), ) size = models.CharField(max_length=31, choices=DOG_SIZES, default=SIZE_MD) name = models.CharField(max_length=127) age = models.IntegerField() def __str__(self): return f'{self.name} // {self.age}y ({self.size})' class Walk(ExportModelOperationsMixin('walk'), models.Model): dog = models.ForeignKey(Dog, related_name='walks', on_delete=models.CASCADE) walker = models.ForeignKey(Walker, related_name='walks', on_delete=models.CASCADE) distance = models.IntegerField(default=0, help_text='walk distance (in meters)') start_time = models.DateTimeField(null=True, blank=True, default=None) end_time = models.DateTimeField(null=True, blank=True, default=None) @property def is_complete(self): return self.end_time is not None @classmethod def in_progress(cls): """ get the list of `Walk`s currently in progress """ return cls.objects.filter(start_time__isnull=False, end_time__isnull=True) def __str__(self): return f'{self.walker.name} // {self.dog.name} @ {self.start_time} ({self.id})'
# walker/views.py from django.shortcuts import render, redirect from django.views import View from django.core.exceptions import ObjectDoesNotExist from django.http import HttpResponseNotFound, JsonResponse, HttpResponseBadRequest, Http404 from django.urls import reverse from django.utils.timezone import now from walker import models, forms class WalkDetailsView(View): def get_walk(self, walk_id=None): try: return models.Walk.objects.get(id=walk_id) except ObjectDoesNotExist: raise Http404(f'no walk with ID {walk_id} in progress') class CheckWalkStatusView(WalkDetailsView): def get(self, request, walk_id=None, **kwargs): walk = self.get_walk(walk_id=walk_id) return JsonResponse({'complete': walk.is_complete}) class CompleteWalkView(WalkDetailsView): def get(self, request, walk_id=None, **kwargs): walk = self.get_walk(walk_id=walk_id) return render(request, 'index.html', context={'form': forms.CompleteWalkForm(instance=walk)}) def post(self, request, walk_id=None, **kwargs): try: walk = models.Walk.objects.get(id=walk_id) except ObjectDoesNotExist: return HttpResponseNotFound(content=f'no walk with ID {walk_id} found') if walk.is_complete: return HttpResponseBadRequest(content=f'walk {walk.id} is already complete') form = forms.CompleteWalkForm(data=request.POST, instance=walk) if form.is_valid(): updated_walk = form.save(commit=False) updated_walk.end_time = now() updated_walk.save() return redirect(f'{reverse("walk_start")}?walk={walk.id}') return HttpResponseBadRequest(content=f'form validation failed with errors {form.errors}') class StartWalkView(View): def get(self, request): return render(request, 'index.html', context={'form': forms.StartWalkForm()}) def post(self, request): form = forms.StartWalkForm(data=request.POST) if form.is_valid(): walk = form.save(commit=False) walk.start_time = now() walk.save() return redirect(f'{reverse("walk_start")}?walk={walk.id}') return HttpResponseBadRequest(content=f'form validation failed with errors {form.errors}')
更新應用設置并添加Prometheus urls
現在我們有了一個Django項目以及相應的設置,可以為 django-prometheus添加需要的配置項了。在 settings.py
中添加下面的配置:
INSTALLED_APPS = [ ... 'django_prometheus', ... ] MIDDLEWARE = [ 'django_prometheus.middleware.PrometheusBeforeMiddleware', .... 'django_prometheus.middleware.PrometheusAfterMiddleware', ] # we're assuming a Postgres DB here because, well, that's just the right choice :) DATABASES = { 'default': { 'ENGINE': 'django_prometheus.db.backends.postgresql', 'NAME': os.getenv('DB_NAME'), 'USER': os.getenv('DB_USER'), 'PASSWORD': os.getenv('DB_PASSWORD'), 'HOST': os.getenv('DB_HOST'), 'PORT': os.getenv('DB_PORT', '5432'), }, }
添加url配置到 urls.py
:
urlpatterns = [ ... path('', include('django_prometheus.urls')), ]
現在我們有了一個配置好的基本應用,并為整合做好了準備。
由于django-prometheus
提供了開箱即用功能,我們可以立即追蹤一些基本的模型操作,比如插入和刪除。可以在/metrics
endpoint看到這些:
django-prometheus提供的默認指標
讓我們把它變得更有趣點。
添加一個walker/metrics.py
文件,定義一些要追蹤的基本指標。
# walker/metrics.py from prometheus_client import Counter, Histogram walks_started = Counter('walks_started', 'number of walks started') walks_completed = Counter('walks_completed', 'number of walks completed') invalid_walks = Counter('invalid_walks', 'number of walks attempted to be started, but invalid') walk_distance = Histogram('walk_distance', 'distribution of distance walked', buckets=[0, 50, 200, 400, 800, 1600, 3200])
很簡單,不是嗎?Prometheus文檔很好地解釋了每種指標類型的用途,簡言之,我們使用計數器來表示嚴格隨時間增長的指標,使用直方圖來追蹤包含值分布的指標。下面開始驗證應用的代碼。
# walker/views.py ... from walker import metrics ... class CompleteWalkView(WalkDetailsView): ... def post(self, request, walk_id=None, **kwargs): ... if form.is_valid(): updated_walk = form.save(commit=False) updated_walk.end_time = now() updated_walk.save() metrics.walks_completed.inc() metrics.walk_distance.observe(updated_walk.distance) return redirect(f'{reverse("walk_start")}?walk={walk.id}') return HttpResponseBadRequest(content=f'form validation failed with errors {form.errors}') ... class StartWalkView(View): ... def post(self, request): if form.is_valid(): walk = form.save(commit=False) walk.start_time = now() walk.save() metrics.walks_started.inc() return redirect(f'{reverse("walk_start")}?walk={walk.id}') metrics.invalid_walks.inc() return HttpResponseBadRequest(content=f'form validation failed with errors {form.errors}')
發送幾個樣例請求,可以看到新指標已經產生了。
顯示散步距離和創建散步的指標
定義的指標此時已經可以在prometheus里查找到了
至此,我們已經在代碼中添加了自定義指標,整合了應用以追蹤指標,并驗證了這些指標已在/metrics
上更新并可用。讓我們繼續將儀表化應用部署到Kubernetes集群。
我只會列出和追蹤、導出指標相關的配置內容,完整的Helm chart部署和服務配置可以在 demo應用中找到。 作為起點,這有一些和導出指標相關的deployment和configmap的配置:
# helm/demo/templates/nginx-conf-configmap.yaml apiVersion: v1 kind: ConfigMap metadata: name: {{ include "demo.fullname" . }}-nginx-conf ... data: demo.conf: | upstream app_server { server 127.0.0.1:8000 fail_timeout=0; } server { listen 80; client_max_body_size 4G; # set the correct host(s) for your site server_name{{ range .Values.ingress.hosts }} {{ . }}{{- end }}; keepalive_timeout 5; root /code/static; location / { # checks for static file, if not found proxy to app try_files $uri @proxy_to_app; } location ^~ /metrics { auth_basic "Metrics"; auth_basic_user_file /etc/nginx/secrets/.htpasswd; proxy_pass http://app_server; } location @proxy_to_app { proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; proxy_set_header Host $http_host; # we don't want nginx trying to do something clever with # redirects, we set the Host: header above already. proxy_redirect off; proxy_pass http://app_server; } }
# helm/demo/templates/deployment.yaml apiVersion: apps/v1 kind: Deployment ... spec: metadata: labels: app.kubernetes.io/name: {{ include "demo.name" . }} app.kubernetes.io/instance: {{ .Release.Name }} app: {{ include "demo.name" . }} volumes: ... - name: nginx-conf configMap: name: {{ include "demo.fullname" . }}-nginx-conf - name: prometheus-auth secret: secretName: prometheus-basic-auth ... containers: - name: {{ .Chart.Name }}-nginx image: "{{ .Values.nginx.image.repository }}:{{ .Values.nginx.image.tag }}" imagePullPolicy: IfNotPresent volumeMounts: ... - name: nginx-conf mountPath: /etc/nginx/conf.d/ - name: prometheus-auth mountPath: /etc/nginx/secrets/.htpasswd ports: - name: http containerPort: 80 protocol: TCP - name: {{ .Chart.Name }} image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}" imagePullPolicy: {{ .Values.image.pullPolicy }} command: ["gunicorn", "--worker-class", "gthread", "--threads", "3", "--bind", "0.0.0.0:8000", "demo.wsgi:application"] env: {{ include "demo.env" . | nindent 12 }} ports: - name: gunicorn containerPort: 8000 protocol: TCP ...
沒什么神奇的,只是一些YAML而已。有兩個重點需要強調一下:
我們通過一個nginx反向代理將/metrics
放在了驗證后面,為location塊設置了auth_basic指令集。你可能希望在反向代理之后部署gunicorn ,但這樣做可以獲得保護指標的額外好處。
我們使用多線程的gunicorn而不是多個worker。雖然可以為Prometheus客戶端啟用多進程模式,但在Kubernetes環境中,安裝會更為復雜。為什么這很重要呢?在一個pod中運行多個worker的風險在于,每個worker將在采集時報告自己的一組指標值。但是,由于服務在Prometheus Kubernetes SD scrape配置中被設置為pod級別 ,這些(潛在的)跳轉值將被錯誤地分類為計數器重置,從而導致測量結果不一致。你并不一定需要遵循上述所有步驟,但重點是:如果你了解的不多,應該從一個單線程+單worker的gunicorn環境開始,或者從一個單worker+多線程環境開始。
基于Helm的幫助文檔,部署Prometheus非常簡單,不需要額外工作:
helm upgrade --install prometheus stable/prometheus
幾分鐘后,你應該就可以通過 port-forward
進入Prometheus的pod(默認的容器端口是9090)。
Prometheus Helm chart 有大量的自定義可選項,不過我們只需要設置extraScrapeConfigs
。創建一個values.yaml
文件。你可以略過這部分直接使用 demo應用 作為參考。文件內容如下:
extraScrapeConfigs: | - job_name: demo scrape_interval: 5s metrics_path: /metrics basic_auth: username: prometheus password: prometheus tls_config: insecure_skip_verify: true kubernetes_sd_configs: - role: endpoints namespaces: names: - default relabel_configs: - source_labels: [__meta_kubernetes_service_label_app] regex: demo action: keep - source_labels: [__meta_kubernetes_endpoint_port_name] regex: http action: keep - source_labels: [__meta_kubernetes_namespace] target_label: namespace - source_labels: [__meta_kubernetes_pod_name] target_label: pod - source_labels: [__meta_kubernetes_service_name] target_label: service - source_labels: [__meta_kubernetes_service_name] target_label: job - target_label: endpoint replacement: http
創建完成后,就可以通過下面的操作為prometheus deployment更新配置。
helm upgrade --install prometheus -f values.yaml
為驗證所有的步驟都配置正確了,打開瀏覽器輸入 http://localhost:9090/targets
(假設你已經通過 port-forward
進入了運行prometheus的Pod)。如果你看到demo應用在target的列表中,說明運行正常了。
關于如何使用Django和Prometheus以及Kubernetes定制應用指標就分享到這里了,希望以上內容可以對大家有一定的幫助,可以學到更多知識。如果覺得文章不錯,可以把它分享出去讓更多的人看到。
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。