Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
2.0k views
in Technique[技术] by (71.8m points)

get time that passed since the last increase of Prometheus counter

Consider a Prometheus metric foo_total that counts the total amount of occurences of an event foo, i.e. the metric will only increase as long as the providing service isn't restarted.

Is there any way to get the timespan (e.g. amount of seconds) since the last increase of that metric? I know that due to the scrape period, the value for sure isn't that accurate, but an accurancy of a couple of minutes should be sufficent for me.

Background: I want to use that kind of query in Grafana to have an overview if some services are used regularly and if some jobs are done within a defined grace period. I don't have any influence on the metric itself.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Below is the JSON for a Singlestat panel that will display the time of the last update to the up{job="prometheus"} metric. This is not exactly what you asked for: it's the last time rather than the timespan since; it's only useful as a Singlestat panel (i.e. you can't take the value and graph it since it's not a single value); and it will only display changes covered by the dashboard's time range.

The underlying query is timestamp(changes(up{job="prometheus"}[$__interval]) > 0) * 1000, so the query will basically return all timestamps where there have been any changes during the last $__interval seconds (determined dynamically by the time range and the size of the Singlestat panel in pixels). The Singlestat panel will then display the last value, if there is any. (The * 1000 is there because Grafana expects timestamps in milliseconds.)

{
  "type": "singlestat",
  "title": "Last Change",
  "gridPos": {
    "x": 0,
    "y": 0,
    "w": 12,
    "h": 9
  },
  "id": 8,
  "targets": [
    {
      "expr": "timestamp(changes(up{job="prometheus"}[$__interval]) > 0) * 1000",
      "intervalFactor": 1,
      "format": "time_series",
      "refId": "A",
      "interval": "10s"
    }
  ],
  "links": [],
  "maxDataPoints": 100,
  "interval": null,
  "cacheTimeout": null,
  "format": "dateTimeAsIso",
  "prefix": "",
  "postfix": "",
  "nullText": null,
  "valueMaps": [
    {
      "value": "null",
      "op": "=",
      "text": "N/A"
    }
  ],
  "mappingTypes": [
    {
      "name": "value to text",
      "value": 1
    },
    {
      "name": "range to text",
      "value": 2
    }
  ],
  "rangeMaps": [
    {
      "from": "null",
      "to": "null",
      "text": "N/A"
    }
  ],
  "mappingType": 1,
  "nullPointMode": "connected",
  "valueName": "current",
  "prefixFontSize": "50%",
  "valueFontSize": "80%",
  "postfixFontSize": "50%",
  "thresholds": "",
  "colorBackground": false,
  "colorValue": false,
  "colors": [
    "#299c46",
    "rgba(237, 129, 40, 0.89)",
    "#d44a3a"
  ],
  "sparkline": {
    "show": false,
    "full": false,
    "lineColor": "rgb(31, 120, 193)",
    "fillColor": "rgba(31, 118, 189, 0.18)"
  },
  "gauge": {
    "show": false,
    "minValue": 0,
    "maxValue": 100,
    "thresholdMarkers": true,
    "thresholdLabels": false
  },
  "tableColumn": ""
}

If you wanted this to be more reliable, you could define a Prometheus recording rule that with a value equal to the current timestamp if there have been any changes in the last few seconds/minutes (depending on how frequently Prometheus collects the metric) or the rule's previous value otherwise. E.g. (not tested):

groups:

- name: last-update
  rules:

  - record: last-update
    expr: |
      timestamp(changes(up{job="prometheus"}[1m]) > 0)
        or
      last-update

Replacing up{job="prometheus"} with your metric selector and 1m with an interval that is at least as long as your collection interval and ideally quite a bit longer, in order to cover any collection interval jitter or missed scrapes).

Then you would use an expression like time() - last-update in Grafana to get the timespan since the last change. And you could use it in any sort of panel, without having to rely on the panel picking the last value for you.

Edit: One of the new features expected in the 2.7.0 release of Prometheus (which is due in about 2-3 weeks, if they keep to their 6 week release schedule) is subquery support. Meaning that you should be able to implement the latter, "more reliable" solution without the help of a recording rule.

If I understand this correctly, the query should look something like this:

time() - max_over_time(timestamp(changes(up{job="prometheus"}[5m]) > 0)[24h:1m])

But, just as before, this will not be a particularly efficient query, particularly over large numbers of series. You may also want to subtract 5 minutes from that and limit it using clamp_min to a non-negative value, to adjust for the 5 minute range.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...