doc/cdeftutorial.pod

   1 =head1 NAME
   2
   3 cdeftutorial - Alex van den Bogaerdt's CDEF tutorial
   4
   5 =head1 DESCRIPTION
   6
   7 If you provide a question, I will try to provide an answer in the next
   8 release of this tutorial. No feedback equals no changes! Additions to
   9 this document are also welcome.  -- Alex van den Bogaerdt
  10 E<lt>alex@ergens.op.het.netE<gt>
  11
  12 =head2 Why this tutorial?
  13
  14 One of the powerful parts of RRDtool is its ability to do all sorts
  15 of calculations on the data retrieved from its databases. However,
  16 RRDtool's many options and syntax make it difficult for the average
  17 user to understand. The manuals are good at explaining what these
  18 options do; however they do not (and should not) explain in detail
  19 why they are useful. As with my RRDtool tutorial: if you want a
  20 simple document in simple language you should read this tutorial.
  21 If you are happy with the official documentation, you may find this
  22 document too simple or even boring. If you do choose to read this
  23 tutorial, I also expect you to have read and fully understand my
  24 other tutorial.
  25
  26 =head2 More reading
  27
  28 If you have difficulties with the way I try to explain it please read
  29 Steve Rader's L<rpntutorial>. It may help you understand how this all works.
  30
  31 =head1 What are CDEFs?
  32
  33 When retrieving data from an RRD, you are using a "DEF" to work with
  34 that data. Think of it as a variable that changes over time (where
  35 time is the x-axis). The value of this variable is what is found in
  36 the database at that particular time and you can't do any
  37 modifications on it. This is what CDEFs are for: they takes values
  38 from DEFs and perform calculations on them.
  39
  40 =head1 Syntax
  41
  42    DEF:var_name_1=some.rrd:ds_name:CF
  43    CDEF:var_name_2=RPN_expression
  44
  45 You first define "var_name_1" to be data collected from data source
  46 "ds_name" found in RRD "some.rrd" with consolidation function "CF".
  47
  48 Assume the ifInOctets SNMP counter is saved in mrtg.rrd as the DS "in".
  49 Then the following DEF defines a variable for the average of that
  50 data source:
  51
  52    DEF:inbytes=mrtg.rrd:in:AVERAGE
  53
  54 Say you want to display bits per second (instead of bytes per second
  55 as stored in the database.)  You have to define a calculation
  56 (hence "CDEF") on variable "inbytes" and use that variable (inbits)
  57 instead of the original:
  58
  59    CDEF:inbits=inbytes,8,*
  60
  61 This tells RRDtool to multiply inbytes by eight to get inbits. I'll
  62 explain later how this works. In the graphing or printing functions,
  63 you can now use inbits where you would use inbytes otherwise.
  64
  65 Note that the variable name used in the CDEF (inbits) must not be the
  66 same as the variable named in the DEF (inbytes)!
  67
  68 =head1 RPN-expressions
  69
  70 RPN is short-hand for Reverse Polish Notation. It works as follows.
  71 You put the variables or numbers on a stack. You also put operations
  72 (things-to-do) on the stack and this stack is then processed. The result
  73 will be placed on the stack. At the end, there should be exactly one
  74 number left: the outcome of the series of operations. If there is not
  75 exactly one number left, RRDtool will complain loudly.
  76
  77 Above multiplication by eight will look like:
  78
  79 =over 4
  80
  81 =item 1.
  82
  83 Start with an empty stack
  84
  85 =item 2.
  86
  87 Put the content of variable inbytes on the stack
  88
  89 =item 3.
  90
  91 Put the number eight on the stack
  92
  93 =item 4.
  94
  95 Put the operation multiply on the stack
  96
  97 =item 5.
  98
  99 Process the stack
 100
 101 =item 6.
 102
 103 Retrieve the value from the stack and put it in variable inbits
 104
 105 =back
 106
 107 We will now do an example with real numbers. Suppose the variable
 108 inbytes would have value 10, the stack would be:
 109
 110 =over 4
 111
 112 =item 1.
 113
 114 ||
 115
 116 =item 2.
 117
 118 |10|
 119
 120 =item 3.
 121
 122 |10|8|
 123
 124 =item 4.
 125
 126 |10|8|*|
 127
 128 =item 5.
 129
 130 |80|
 131
 132 =item 6.
 133
 134 ||
 135
 136 =back
 137
 138 Processing the stack (step 5) will retrieve one value from the stack
 139 (from the right at step 4). This is the operation multiply and this
 140 takes two values off the stack as input. The result is put back on the
 141 stack (the value 80 in this case). For multiplication the order doesn't
 142 matter, but for other operations like subtraction and division it does.
 143 Generally speaking you have the following order:
 144
 145    y = A - B  -->  y=minus(A,B)  -->  CDEF:y=A,B,-
 146
 147 This is not very intuitive (at least most people don't think so). For
 148 the function f(A,B) you reverse the position of "f", but you do not
 149 reverse the order of the variables.
 150
 151 =head1 Converting your wishes to RPN
 152
 153 First, get a clear picture of what you want to do. Break down the problem
 154 in smaller portions until they cannot be split anymore. Then it is rather
 155 simple to convert your ideas into RPN.
 156
 157 Suppose you have several RRDs and would like to add up some counters in
 158 them. These could be, for instance, the counters for every WAN link you
 159 are monitoring.
 160
 161 You have:
 162
 163    router1.rrd with link1in link2in
 164    router2.rrd with link1in link2in
 165    router3.rrd with link1in link2in
 166
 167 Suppose you would like to add up all these counters, except for link2in
 168 inside router2.rrd. You need to do:
 169
 170 (in this example, "router1.rrd:link1in" means the DS link1in inside the
 171 RRD router1.rrd)
 172
 173    router1.rrd:link1in
 174    router1.rrd:link2in
 175    router2.rrd:link1in
 176    router3.rrd:link1in
 177    router3.rrd:link2in
 178    --------------------   +
 179    (outcome of the sum)
 180
 181 As a mathematical function, this could be written:
 182
 183 C<add(router1.rrd:link1in , router1.rrd:link2in , router2.rrd:link1in , router3.rrd:link1in , router3.rrd:link2.in)>
 184
 185 With RRDtool and RPN, first, define the inputs:
 186
 187    DEF:a=router1.rrd:link1in:AVERAGE
 188    DEF:b=router1.rrd:link2in:AVERAGE
 189    DEF:c=router2.rrd:link1in:AVERAGE
 190    DEF:d=router3.rrd:link1in:AVERAGE
 191    DEF:e=router3.rrd:link2in:AVERAGE
 192
 193 Now, the mathematical function becomes: C<add(a,b,c,d,e)>
 194
 195 In RPN, there's no operator that sums more than two values so you need
 196 to do several additions. You add a and b, add c to the result, add d
 197 to the result and add e to the result.
 198
 199    push a:         a     stack contains the value of a
 200    push b and add: b,+   stack contains the result of a+b
 201    push c and add: c,+   stack contains the result of a+b+c
 202    push d and add: d,+   stack contains the result of a+b+c+d
 203    push e and add: e,+   stack contains the result of a+b+c+d+e
 204
 205 What was calculated here would be written down as:
 206
 207    ( ( ( (a+b) + c) + d) + e) >
 208
 209 This is in RPN:  C<CDEF:result=a,b,+,c,+,d,+,e,+>
 210
 211 This is correct but it can be made more clear to humans. It does
 212 not matter if you add a to b and then add c to the result or first
 213 add b to c and then add a to the result. This makes it possible to
 214 rewrite the RPN into C<CDEF:result=a,b,c,d,e,+,+,+,+> which is
 215 evaluated differently:
 216
 217    push value of variable a on the stack: a
 218    push value of variable b on the stack: a b
 219    push value of variable c on the stack: a b c
 220    push value of variable d on the stack: a b c d
 221    push value of variable e on the stack: a b c d e
 222    push operator + on the stack:          a b c d e +
 223    and process it:                        a b c P   (where P == d+e)
 224    push operator + on the stack:          a b c P +
 225    and process it:                        a b Q     (where Q == c+P)
 226    push operator + on the stack:          a b Q +
 227    and process it:                        a R       (where R == b+Q)
 228    push operator + on the stack:          a R +
 229    and process it:                        S         (where S == a+R)
 230
 231 As you can see the RPN expression C<a,b,c,d,e,+,+,+,+,+> will evaluate in
 232 C<((((d+e)+c)+b)+a)> and it has the same outcome as C<a,b,+,c,+,d,+,e,+>.
 233 This is called the commutative law of addition,
 234 but you may forget this right away, as long as you remember what it
 235 means.
 236
 237 Now look at an expression that contains a multiplication:
 238
 239 First in normal math: C<let result = a+b*c>. In this case you can't
 240 choose the order yourself, you have to start with the multiplication
 241 and then add a to it. You may alter the position of b and c, you must
 242 not alter the position of a and b.
 243
 244 You have to take this in consideration when converting this expression
 245 into RPN. Read it as: "Add the outcome of b*c to a" and then it is
 246 easy to write the RPN expression: C<result=a,b,c,*,+>
 247 Another expression that would return the same: C<result=b,c,*,a,+>
 248
 249 In normal math, you may encounter something like "a*(b+c)" and this
 250 can also be converted into RPN. The parenthesis just tell you to first
 251 add b and c, and then multiply a with the result. Again, now it is
 252 easy to write it in RPN: C<result=a,b,c,+,*>. Note that this is very
 253 similar to one of the expressions in the previous paragraph, only the
 254 multiplication and the addition changed places.
 255
 256 When you have problems with RPN or when RRDtool is complaining, it's
 257 usually a good thing to write down the stack on a piece of paper
 258 and see what happens. Have the manual ready and pretend to be RRDtool.
 259 Just do all the math by hand to see what happens, I'm sure this will
 260 solve most, if not all, problems you encounter.
 261
 262 =head1 Some special numbers
 263
 264 =head2 The unknown value
 265
 266 Sometimes collecting your data will fail. This can be very common,
 267 especially when querying over busy links. RRDtool can be configured
 268 to allow for one (or even more) unknown value(s) and calculate the missing
 269 update. You can, for instance, query your device every minute. This is
 270 creating one so called PDP or primary data point per minute. If you
 271 defined your RRD to contain an RRA that stores 5-minute values, you need
 272 five of those PDPs to create one CDP (consolidated data point).
 273 These PDPs can become unknown in two cases:
 274
 275 =over 4
 276
 277 =item 1.
 278
 279 The updates are too far apart. This is tuned using the "heartbeat" setting.
 280
 281 =item 2.
 282
 283 The update was set to unknown on purpose by inserting no value (using the
 284 template option) or by using "U" as the value to insert.
 285
 286 =back
 287
 288 When a CDP is calculated, another mechanism determines if this CDP is valid
 289 or not. If there are too many PDPs unknown, the CDP is unknown as well.
 290 This is determined by the xff factor. Please note that one unknown counter
 291 update can result in two unknown PDPs! If you only allow for one unknown
 292 PDP per CDP, this makes the CDP go unknown!
 293
 294 Suppose the counter increments with one per second and you retrieve it
 295 every minute:
 296
 297    counter value    resulting rate
 298    10'000
 299    10'060            1; (10'060-10'000)/60 == 1
 300    10'120            1; (10'120-10'060)/60 == 1
 301    unknown           unknown; you don't know the last value
 302    10'240            unknown; you don't know the previous value
 303    10'300            1; (10'300-10'240)/60 == 1
 304
 305 If the CDP was to be calculated from the last five updates, it would get
 306 two unknown PDPs and three known PDPs. If xff would have been set to 0.5
 307 which by the way is a commonly used factor, the CDP would have a known
 308 value of 1. If xff would have been set to 0.2 then the resulting CDP
 309 would be unknown.
 310
 311 You have to decide the proper values for heartbeat, number of PDPs per
 312 CDP and the xff factor. As you can see from the previous text they define
 313 the behavior of your RRA.
 314
 315 =head2 Working with unknown data in your database
 316
 317 As you have read in the previous chapter, entries in an RRA can be
 318 set to the unknown value. If you do calculations with this type of
 319 value, the result has to be unknown too. This means that an expression
 320 such as C<result=a,b,+> will be unknown if either a or b is unknown.
 321 It would be wrong to just ignore the unknown value and return the value
 322 of the other parameter. By doing so, you would assume "unknown" means "zero"
 323 and this is not true.
 324
 325 There has been a case where somebody was collecting data for over a year.
 326 A new piece of equipment was installed, a new RRD was created and the
 327 scripts were changed to add a counter from the old database and a counter
 328 from the new database. The result was disappointing, a large part of
 329 the statistics seemed to have vanished mysteriously ...
 330 They of course didn't, values from the old database (known values) were
 331 added to values from the new database (unknown values) and the result was
 332 unknown.
 333
 334 In this case, it is fairly reasonable to use a CDEF that alters unknown
 335 data into zero. The counters of the device were unknown (after all, it
 336 wasn't installed yet!) but you know that the data rate through the device
 337 had to be zero (because of the same reason: it was not installed).
 338
 339 There are some examples below that make this change.
 340
 341 =head2 Infinity
 342
 343 Infinite data is another form of a special number. It cannot be
 344 graphed because by definition you would never reach the infinite
 345 value. You can think of positive and negative infinity depending on
 346 the position relative to zero.
 347
 348 RRDtool is capable of representing (-not- graphing!) infinity by stopping
 349 at its current maximum (for positive infinity) or minimum (for negative
 350 infinity) without knowing this maximum (minimum).
 351
 352 Infinity in RRDtool is mostly used to draw an AREA without knowing its
 353 vertical dimensions. You can think of it as drawing an AREA with an
 354 infinite height and displaying only the part that is visible in the
 355 current graph. This is probably a good way to approximate infinity
 356 and it sure allows for some neat tricks. See below for examples.
 357
 358 =head2 Working with unknown data and infinity
 359
 360 Sometimes you would like to discard unknown data and pretend it is zero
 361 (or any other value for that matter) and sometimes you would like to
 362 pretend that known data is unknown (to discard known-to-be-wrong data).
 363 This is why CDEFs have support for unknown data. There are also examples
 364 available that show unknown data by using infinity.
 365
 366 =head1 Some examples
 367
 368 =head2 Example: using a recently created RRD
 369
 370 You are keeping statistics on your router for over a year now. Recently
 371 you installed an extra router and you would like to show the combined
 372 throughput for these two devices.
 373
 374 If you just add up the counters from router.rrd and router2.rrd, you
 375 will add known data (from router.rrd) to unknown data (from router2.rrd) for
 376 the bigger part of your stats. You could solve this in a few ways:
 377
 378 =over 4
 379
 380 =item *
 381
 382 While creating the new database, fill it with zeros from the start to now.
 383 You have to make the database start at or before the least recent time in
 384 the other database.
 385
 386 =item *
 387
 388 Alternatively, you could use CDEF and alter unknown data to zero.
 389
 390 =back
 391
 392 Both methods have their pros and cons. The first method is troublesome and
 393 if you want to do that you have to figure it out yourself. It is not
 394 possible to create a database filled with zeros, you have to put them in
 395 manually. Implementing the second method is described next:
 396
 397 What we want is: "if the value is unknown, replace it with zero". This
 398 could be written in pseudo-code as:  if (value is unknown) then (zero)
 399 else (value). When reading the L<rrdgraph> manual you notice the "UN"
 400 function that returns zero or one. You also notice the "IF" function
 401 that takes zero or one as input.
 402
 403 First look at the "IF" function. It takes three values from the stack,
 404 the first value is the decision point, the second value is returned to
 405 the stack if the evaluation is "true" and if not, the third value is
 406 returned to the stack. We want the "UN" function to decide what happens
 407 so we combine those two functions in one CDEF.
 408
 409 Lets write down the two possible paths for the "IF" function:
 410
 411    if true  return a
 412    if false return b
 413
 414 In RPN:  C<result=x,a,b,IF> where "x" is either true or false.
 415
 416 Now we have to fill in "x", this should be the "(value is unknown)" part
 417 and this is in RPN:  C<result=value,UN>
 418
 419 We now combine them: C<result=value,UN,a,b,IF> and when we fill in the
 420 appropriate things for "a" and "b" we're finished:
 421
 422 C<CDEF:result=value,UN,0,value,IF>
 423
 424 You may want to read Steve Rader's RPN guide if you have difficulties
 425 with the way I explained this last example.
 426
 427 If you want to check this RPN expression, just mimic RRDtool behavior:
 428
 429    For any known value, the expression evaluates as follows:
 430    CDEF:result=value,UN,0,value,IF  (value,UN) is not true so it becomes 0
 431    CDEF:result=0,0,value,IF         "IF" will return the 3rd value
 432    CDEF:result=value                The known value is returned
 433
 434    For the unknown value, this happens:
 435    CDEF:result=value,UN,0,value,IF  (value,UN) is true so it becomes 1
 436    CDEF:result=1,0,value,IF         "IF" sees 1 and returns the 2nd value
 437    CDEF:result=0                    Zero is returned
 438
 439 Of course, if you would like to see another value instead of zero, you
 440 can use that other value.
 441
 442 Eventually, when all unknown data is removed from the RRD, you may want
 443 to remove this rule so that unknown data is properly displayed.
 444
 445 =head2 Example: better handling of unknown data, by using time
 446
 447 The above example has one drawback. If you do log unknown data in
 448 your database after installing your new equipment, it will also be
 449 translated into zero and therefore you won't see that there was a
 450 problem. This is not good and what you really want to do is:
 451
 452 =over 4
 453
 454 =item *
 455
 456 If there is unknown data, look at the time that this sample was taken.
 457
 458 =item *
 459
 460 If the unknown value is before time xxx, make it zero.
 461
 462 =item *
 463
 464 If it is after time xxx, leave it as unknown data.
 465
 466 =back
 467
 468 This is doable: you can compare the time that the sample was taken
 469 to some known time. Assuming you started to monitor your device on
 470 Friday September 17, 1999, 00:35:57 MET DST. Translate this time in seconds
 471 since 1970-01-01 and it becomes 937'521'357. If you process unknown values
 472 that were received after this time, you want to leave them unknown and
 473 if they were "received" before this time, you want to translate them
 474 into zero (so you can effectively ignore them while adding them to your
 475 other routers counters).
 476
 477 Translating Friday September 17, 1999, 00:35:57 MET DST into 937'521'357 can
 478 be done by, for instance, using gnu date:
 479
 480    date -d "19990917 00:35:57" +%s
 481
 482 You could also dump the database and see where the data starts to be
 483 known. There are several other ways of doing this, just pick one.
 484
 485 Now we have to create the magic that allows us to process unknown
 486 values different depending on the time that the sample was taken.
 487 This is a three step process:
 488
 489 =over 4
 490
 491 =item 1.
 492
 493 If the timestamp of the value is after 937'521'357, leave it as is.
 494
 495 =item 2.
 496
 497 If the value is a known value, leave it as is.
 498
 499 =item 3.
 500
 501 Change the unknown value into zero.
 502
 503 =back
 504
 505 Lets look at part one:
 506
 507     if (true) return the original value
 508
 509 We rewrite this:
 510
 511     if (true) return "a"
 512     if (false) return "b"
 513
 514 We need to calculate true or false from step 1. There is a function
 515 available that returns the timestamp for the current sample. It is
 516 called, how surprisingly, "TIME". This time has to be compared to
 517 a constant number, we need "GT". The output of "GT" is true or false
 518 and this is good input to "IF". We want "if (time > 937521357) then
 519 (return a) else (return b)".
 520
 521 This process was already described thoroughly in the previous chapter
 522 so lets do it quick:
 523
 524    if (x) then a else b
 525       where x represents "time>937521357"
 526       where a represents the original value
 527       where b represents the outcome of the previous example
 528
 529    time>937521357       --> TIME,937521357,GT
 530
 531    if (x) then a else b --> x,a,b,IF
 532    substitute x         --> TIME,937521357,GT,a,b,IF
 533    substitute a         --> TIME,937521357,GT,value,b,IF
 534    substitute b         --> TIME,937521357,GT,value,value,UN,0,value,IF,IF
 535
 536 We end up with:
 537 C<CDEF:result=TIME,937521357,GT,value,value,UN,0,value,IF,IF>
 538
 539 This looks very complex, however, as you can see, it was not too hard to
 540 come up with.
 541
 542 =head2 Example: Pretending weird data isn't there
 543
 544 Suppose you have a problem that shows up as huge spikes in your graph.
 545 You know this happens and why, so you decide to work around the problem.
 546 Perhaps you're using your network to do a backup at night and by doing
 547 so you get almost 10mb/s while the rest of your network activity does
 548 not produce numbers higher than 100kb/s.
 549
 550 There are two options:
 551
 552 =over 4
 553
 554 =item 1.
 555
 556 If the number exceeds 100kb/s it is wrong and you want it masked out
 557 by changing it into unknown.
 558
 559 =item 2.
 560
 561 You don't want the graph to show more than 100kb/s.
 562
 563 =back
 564
 565 Pseudo code: if (number > 100) then unknown else number
 566 or
 567 Pseudo code: if (number > 100) then 100 else number.
 568
 569 The second "problem" may also be solved by using the rigid option of
 570 RRDtool graph, however this has not the same result. In this example
 571 you can end up with a graph that does autoscaling. Also, if you use
 572 the numbers to display maxima they will be set to 100kb/s.
 573
 574 We use "IF" and "GT" again. "if (x) then (y) else (z)" is written
 575 down as "CDEF:result=x,y,z,IF"; now fill in x, y and z.
 576 For x you fill in "number greater than 100kb/s" becoming
 577 "number,100000,GT" (kilo is 1'000 and b/s is what we measure!).
 578 The "z" part is "number" in both cases and the "y" part is either
 579 "UNKN" for unknown or "100000" for 100kb/s.
 580
 581 The two CDEF expressions would be:
 582
 583     CDEF:result=number,100000,GT,UNKN,number,IF
 584     CDEF:result=number,100000,GT,100000,number,IF
 585
 586 =head2 Example: working on a certain time span
 587
 588 If you want a graph that spans a few weeks, but would only want to
 589 see some routers' data for one week, you need to "hide" the rest of
 590 the time frame. Don't ask me when this would be useful, it's just
 591 here for the example :)
 592
 593 We need to compare the time stamp to a begin date and an end date.
 594 Comparing isn't difficult:
 595
 596         TIME,begintime,GE
 597         TIME,endtime,LE
 598
 599 These two parts of the CDEF produce either 0 for false or 1 for true.
 600 We can now check if they are both 0 (or 1) using a few IF statements
 601 but, as Wataru Satoh pointed out, we can use the "*" or "+" functions
 602 as logical AND and logical OR.
 603
 604 For "*", the result will be zero (false) if either one of the two
 605 operators is zero.  For "+", the result will only be false (0) when
 606 two false (0) operators will be added.  Warning: *any* number not
 607 equal to 0 will be considered "true". This means that, for instance,
 608 "-1,1,+" (which should be "true or true") will become FALSE ...
 609 In other words, use "+" only if you know for sure that you have positive
 610 numbers (or zero) only.
 611
 612 Let's compile the complete CDEF:
 613
 614         DEF:ds0=router1.rrd:AVERAGE
 615         CDEF:ds0modified=TIME,begintime,GE,TIME,endtime,LE,*,UNKN,ds0,IF
 616
 617 This will return the value of ds0 if both comparisons return true. You
 618 could also do it the other way around:
 619
 620         DEF:ds0=router1.rrd:AVERAGE
 621         CDEF:ds0modified=TIME,begintime,LT,TIME,endtime,GT,+,UNKN,ds0,IF
 622
 623 This will return an UNKNOWN if either comparison returns true.
 624
 625 =head2 Example: You suspect to have problems and want to see unknown data.
 626
 627 Suppose you add up the number of active users on several terminal servers.
 628 If one of them doesn't give an answer (or an incorrect one) you get "NaN"
 629 in the database ("Not a Number") and NaN is evaluated as Unknown.
 630
 631 In this case, you would like to be alerted to it and the sum of the
 632 remaining values is of no value to you.
 633
 634 It would be something like:
 635
 636     DEF:users1=location1.rrd:onlineTS1:LAST
 637     DEF:users2=location1.rrd:onlineTS2:LAST
 638     DEF:users3=location2.rrd:onlineTS1:LAST
 639     DEF:users4=location2.rrd:onlineTS2:LAST
 640     CDEF:allusers=users1,users2,users3,users4,+,+,+
 641
 642 If you now plot allusers, unknown data in one of users1..users4 will
 643 show up as a gap in your graph. You want to modify this to show a
 644 bright red line, not a gap.
 645
 646 Define an extra CDEF that is unknown if all is okay and is infinite if
 647 there is an unknown value:
 648
 649     CDEF:wrongdata=allusers,UN,INF,UNKN,IF
 650
 651 "allusers,UN" will evaluate to either true or false, it is the (x) part
 652 of the "IF" function and it checks if allusers is unknown.
 653 The (y) part of the "IF" function is set to "INF" (which means infinity)
 654 and the (z) part of the function returns "UNKN".
 655
 656 The logic is: if (allusers == unknown) then return INF else return UNKN.
 657
 658 You can now use AREA to display this "wrongdata" in bright red. If it
 659 is unknown (because allusers is known) then the red AREA won't show up.
 660 If the value is INF (because allusers is unknown) then the red AREA will
 661 be filled in on the graph at that particular time.
 662
 663    AREA:allusers#0000FF:combined user count
 664    AREA:wrongdata#FF0000:unknown data
 665
 666 =head2 Same example useful with STACKed data:
 667
 668 If you use stack in the previous example (as I would do) then you don't
 669 add up the values. Therefore, there is no relationship between the
 670 four values and you don't get a single value to test.
 671 Suppose users3 would be unknown at one point in time: users1 is plotted,
 672 users2 is stacked on top of users1, users3 is unknown and therefore
 673 nothing happens, users4 is stacked on top of users2.
 674 Add the extra CDEFs anyway and use them to overlay the "normal" graph:
 675
 676    DEF:users1=location1.rrd:onlineTS1:LAST
 677    DEF:users2=location1.rrd:onlineTS2:LAST
 678    DEF:users3=location2.rrd:onlineTS1:LAST
 679    DEF:users4=location2.rrd:onlineTS2:LAST
 680    CDEF:allusers=users1,users2,users3,users4,+,+,+
 681    CDEF:wrongdata=allusers,UN,INF,UNKN,IF
 682    AREA:users1#0000FF:users at ts1
 683    STACK:users2#00FF00:users at ts2
 684    STACK:users3#00FFFF:users at ts3
 685    STACK:users4#FFFF00:users at ts4
 686    AREA:wrongdata#FF0000:unknown data
 687
 688 If there is unknown data in one of users1..users4, the "wrongdata" AREA
 689 will be drawn and because it starts at the X-axis and has infinite height
 690 it will effectively overwrite the STACKed parts.
 691
 692 You could combine the two CDEF lines into one (we don't use "allusers")
 693 if you like.  But there are good reasons for writing two CDEFS:
 694
 695 =over 4
 696
 697 =item *
 698
 699 It improves the readability of the script.
 700
 701 =item *
 702
 703 It can be used inside GPRINT to display the total number of users.
 704
 705 =back
 706
 707 If you choose to combine them, you can substitute the "allusers" in the
 708 second CDEF with the part after the equal sign from the first line:
 709
 710    CDEF:wrongdata=users1,users2,users3,users4,+,+,+,UN,INF,UNKN,IF
 711
 712 If you do so, you won't be able to use these next GPRINTs:
 713
 714    COMMENT:"Total number of users seen"
 715    GPRINT:allusers:MAX:"Maximum: %6.0lf"
 716    GPRINT:allusers:MIN:"Minimum: %6.0lf"
 717    GPRINT:allusers:AVERAGE:"Average: %6.0lf"
 718    GPRINT:allusers:LAST:"Current: %6.0lf\n"
 719
 720 =head1 The examples from the RRD graph manual page
 721
 722 =head2 Degrees Celsius vs. Degrees Fahrenheit
 723
 724    rrdtool graph demo.png --title="Demo Graph" \
 725       DEF:cel=demo.rrd:exhaust:AVERAGE \
 726       CDEF:far=cel,32,-,0.55555,* \
 727       LINE2:cel#00a000:"D. Celsius" \
 728       LINE2:far#ff0000:"D. Fahrenheit\c"
 729
 730 This example gets the DS called "exhaust" from database "demo.rrd"
 731 and puts the values in variable "cel". The CDEF used is evaluated
 732 as follows:
 733
 734    CDEF:far=cel,32,-,0.5555,*
 735    1. push variable "cel"
 736    2. push 32
 737    3. push function "minus" and process it
 738       The stack now contains values that are 32 less than "cel"
 739    4. push 0.5555
 740    5. push function "multiply" and process it
 741    6. the resulting value is now "(cel-32)*0.55555"
 742
 743 Note that if you take the Celsius to Fahrenheit function you should
 744 be doing "5/9*(cel-32)" so 0.55555 is not exactly correct. It is close
 745 enough for this purpose and it saves a calculation.
 746
 747 =head2 Changing unknown into zero
 748
 749    rrdtool graph demo.png --title="Demo Graph" \
 750       DEF:idat1=interface1.rrd:ds0:AVERAGE \
 751       DEF:idat2=interface2.rrd:ds0:AVERAGE \
 752       DEF:odat1=interface1.rrd:ds1:AVERAGE \
 753       DEF:odat2=interface2.rrd:ds1:AVERAGE \
 754       CDEF:agginput=idat1,UN,0,idat1,IF,idat2,UN,0,idat2,IF,+,8,* \
 755       CDEF:aggoutput=odat1,UN,0,odat1,IF,odat2,UN,0,odat2,IF,+,8,* \
 756       AREA:agginput#00cc00:Input Aggregate \
 757       LINE1:aggoutput#0000FF:Output Aggregate
 758
 759 These two CDEFs are built from several functions. It helps to split
 760 them when viewing what they do. Starting with the first CDEF we would
 761 get:
 762
 763  idat1,UN --> a
 764  0        --> b
 765  idat1    --> c
 766  if (a) then (b) else (c)
 767
 768 The result is therefore "0" if it is true that "idat1" equals "UN".
 769 If not, the original value of "idat1" is put back on the stack.
 770 Lets call this answer "d". The process is repeated for the next
 771 five items on the stack, it is done the same and will return answer
 772 "h". The resulting stack is therefore "d,h".
 773 The expression has been simplified to "d,h,+,8,*" and it will now be
 774 easy to see that we add "d" and "h", and multiply the result with eight.
 775
 776 The end result is that we have added "idat1" and "idat2" and in the
 777 process we effectively ignored unknown values. The result is multiplied
 778 by eight, most likely to convert bytes/s to bits/s.
 779
 780 =head2 Infinity demo
 781
 782    rrdtool graph example.png --title="INF demo" \
 783       DEF:val1=some.rrd:ds0:AVERAGE \
 784       DEF:val2=some.rrd:ds1:AVERAGE \
 785       DEF:val3=some.rrd:ds2:AVERAGE \
 786       DEF:val4=other.rrd:ds0:AVERAGE \
 787       CDEF:background=val4,POP,TIME,7200,%,3600,LE,INF,UNKN,IF \
 788       CDEF:wipeout=val1,val2,val3,val4,+,+,+,UN,INF,UNKN,IF \
 789       AREA:background#F0F0F0 \
 790       AREA:val1#0000FF:Value1 \
 791       STACK:val2#00C000:Value2 \
 792       STACK:val3#FFFF00:Value3 \
 793       STACK:val4#FFC000:Value4 \
 794       AREA:whipeout#FF0000:Unknown
 795
 796 This demo demonstrates two ways to use infinity. It is a bit tricky
 797 to see what happens in the "background" CDEF.
 798
 799    "val4,POP,TIME,7200,%,3600,LE,INF,UNKN,IF"
 800
 801 This RPN takes the value of "val4" as input and then immediately
 802 removes it from the stack using "POP". The stack is now empty but
 803 as a side effect we now know the time that this sample was taken.
 804 This time is put on the stack by the "TIME" function.
 805
 806 "TIME,7200,%" takes the modulo of time and 7'200 (which is two hours).
 807 The resulting value on the stack will be a number in the range from
 808 0 to 7199.
 809
 810 For people who don't know the modulo function: it is the remainder
 811 after an integer division. If you divide 16 by 3, the answer would
 812 be 5 and the remainder would be 1. So, "16,3,%" returns 1.
 813
 814 We have the result of "TIME,7200,%" on the stack, lets call this
 815 "a". The start of the RPN has become "a,3600,LE" and this checks
 816 if "a" is less or equal than "3600". It is true half of the time.
 817 We now have to process the rest of the RPN and this is only a simple
 818 "IF" function that returns either "INF" or "UNKN" depending on the
 819 time. This is returned to variable "background".
 820
 821 The second CDEF has been discussed earlier in this document so we
 822 won't do that here.
 823
 824 Now you can draw the different layers. Start with the background
 825 that is either unknown (nothing to see) or infinite (the whole
 826 positive part of the graph gets filled).
 827
 828 Next you draw the data on top of this background, it will overlay
 829 the background. Suppose one of val1..val4 would be unknown, in that
 830 case you end up with only three bars stacked on top of each other.
 831 You don't want to see this because the data is only valid when all
 832 four variables are valid. This is why you use the second CDEF, it
 833 will overlay the data with an AREA so the data cannot be seen anymore.
 834
 835 If your data can also have negative values you also need to overwrite
 836 the other half of your graph. This can be done in a relatively simple
 837 way: what you need is the "wipeout" variable and place a negative
 838 sign before it:  "CDEF:wipeout2=wipeout,-1,*"
 839
 840 =head2 Filtering data
 841
 842 You may do some complex data filtering:
 843
 844   MEDIAN FILTER: filters shot noise
 845
 846     DEF:var=database.rrd:traffic:AVERAGE
 847     CDEF:prev1=PREV(var)
 848     CDEF:prev2=PREV(prev1)
 849     CDEF:prev3=PREV(prev2)
 850     CDEF:median=prev1,prev2,prev3,+,+,3,/
 851     LINE3:median#000077:filtered
 852     LINE1:prev2#007700:'raw data'
 853
 854
 855   DERIVATE:
 856
 857     DEF:var=database.rrd:traffic:AVERAGE
 858     CDEF:prev1=PREV(var)
 859     CDEF:time=TIME
 860     CDEF:prevtime=PREV(time)
 861     CDEF:derivate=var,prev1,-,time,prevtime,-,/
 862     LINE3:derivate#000077:derivate
 863     LINE1:var#007700:'raw data'
 864
 865
 866 =head1 Out of ideas for now
 867
 868 This document was created from questions asked by either myself or by
 869 other people on the RRDtool mailing list. Please let me know if you
 870 find errors in it or if you have trouble understanding it. If you
 871 think there should be an addition, mail me:
 872 E<lt>alex@ergens.op.het.netE<gt>
 873
 874 Remember: B<No feedback equals no changes!>
 875
 876 =head1 SEE ALSO
 877
 878 The RRDtool manpages
 879
 880 =head1 AUTHOR
 881
 882 Alex van den Bogaerdt
 883 E<lt>alex@ergens.op.het.netE<gt>