doc/cdeftutorial.pod

   1 =head1 NAME
   2
   3 cdeftutorial - Alex van den Bogaerdt's CDEF tutorial
   4
   5 =head1 DESCRIPTION
   6
   7 Intention of this document: to provide some examples of the commonly
   8 used parts of RRDtool's CDEF language.
   9
  10 If you think some important feature is not explained properly, and if
  11 adding it to this document would benefit most users, please do ask me
  12 to add it.  I will then try to provide an answer in the next release
  13 of this tutorial.  No feedback equals no changes! Additions to
  14 this document are also welcome.  -- Alex van den Bogaerdt
  15 E<lt>alex@vandenbogaerdt.nlE<gt>
  16
  17 =head2 Why this tutorial?
  18
  19 One of the powerful parts of RRDtool is its ability to do all sorts
  20 of calculations on the data retrieved from its databases. However,
  21 RRDtool's many options and syntax make it difficult for the average
  22 user to understand. The manuals are good at explaining what these
  23 options do; however they do not (and should not) explain in detail
  24 why they are useful. As with my RRDtool tutorial: if you want a
  25 simple document in simple language you should read this tutorial.
  26 If you are happy with the official documentation, you may find this
  27 document too simple or even boring. If you do choose to read this
  28 tutorial, I also expect you to have read and fully understand my
  29 other tutorial.
  30
  31 =head2 More reading
  32
  33 If you have difficulties with the way I try to explain it please read
  34 Steve Rader's L<rpntutorial>. It may help you understand how this all works.
  35
  36 =head1 What are CDEFs?
  37
  38 When retrieving data from an RRD, you are using a "DEF" to work with
  39 that data. Think of it as a variable that changes over time (where
  40 time is the x-axis). The value of this variable is what is found in
  41 the database at that particular time and you can't do any
  42 modifications on it. This is what CDEFs are for: they takes values
  43 from DEFs and perform calculations on them.
  44
  45 =head1 Syntax
  46
  47    DEF:var_name_1=some.rrd:ds_name:CF
  48    CDEF:var_name_2=RPN_expression
  49
  50 You first define "var_name_1" to be data collected from data source
  51 "ds_name" found in RRD "some.rrd" with consolidation function "CF".
  52
  53 Assume the ifInOctets SNMP counter is saved in mrtg.rrd as the DS "in".
  54 Then the following DEF defines a variable for the average of that
  55 data source:
  56
  57    DEF:inbytes=mrtg.rrd:in:AVERAGE
  58
  59 Say you want to display bits per second (instead of bytes per second
  60 as stored in the database.)  You have to define a calculation
  61 (hence "CDEF") on variable "inbytes" and use that variable (inbits)
  62 instead of the original:
  63
  64    CDEF:inbits=inbytes,8,*
  65
  66 This tells RRDtool to multiply inbytes by eight to get inbits. I'll
  67 explain later how this works. In the graphing or printing functions,
  68 you can now use inbits where you would use inbytes otherwise.
  69
  70 Note that the variable name used in the CDEF (inbits) must not be the
  71 same as the variable named in the DEF (inbytes)!
  72
  73 =head1 RPN-expressions
  74
  75 RPN is short-hand for Reverse Polish Notation. It works as follows.
  76 You put the variables or numbers on a stack. You also put operations
  77 (things-to-do) on the stack and this stack is then processed. The result
  78 will be placed on the stack. At the end, there should be exactly one
  79 number left: the outcome of the series of operations. If there is not
  80 exactly one number left, RRDtool will complain loudly.
  81
  82 Above multiplication by eight will look like:
  83
  84 =over 4
  85
  86 =item 1.
  87
  88 Start with an empty stack
  89
  90 =item 2.
  91
  92 Put the content of variable inbytes on the stack
  93
  94 =item 3.
  95
  96 Put the number eight on the stack
  97
  98 =item 4.
  99
 100 Put the operation multiply on the stack
 101
 102 =item 5.
 103
 104 Process the stack
 105
 106 =item 6.
 107
 108 Retrieve the value from the stack and put it in variable inbits
 109
 110 =back
 111
 112 We will now do an example with real numbers. Suppose the variable
 113 inbytes would have value 10, the stack would be:
 114
 115 =over 4
 116
 117 =item 1.
 118
 119 ||
 120
 121 =item 2.
 122
 123 |10|
 124
 125 =item 3.
 126
 127 |10|8|
 128
 129 =item 4.
 130
 131 |10|8|*|
 132
 133 =item 5.
 134
 135 |80|
 136
 137 =item 6.
 138
 139 ||
 140
 141 =back
 142
 143 Processing the stack (step 5) will retrieve one value from the stack
 144 (from the right at step 4). This is the operation multiply and this
 145 takes two values off the stack as input. The result is put back on the
 146 stack (the value 80 in this case). For multiplication the order doesn't
 147 matter, but for other operations like subtraction and division it does.
 148 Generally speaking you have the following order:
 149
 150    y = A - B  -->  y=minus(A,B)  -->  CDEF:y=A,B,-
 151
 152 This is not very intuitive (at least most people don't think so). For
 153 the function f(A,B) you reverse the position of "f", but you do not
 154 reverse the order of the variables.
 155
 156 =head1 Converting your wishes to RPN
 157
 158 First, get a clear picture of what you want to do. Break down the problem
 159 in smaller portions until they cannot be split anymore. Then it is rather
 160 simple to convert your ideas into RPN.
 161
 162 Suppose you have several RRDs and would like to add up some counters in
 163 them. These could be, for instance, the counters for every WAN link you
 164 are monitoring.
 165
 166 You have:
 167
 168    router1.rrd with link1in link2in
 169    router2.rrd with link1in link2in
 170    router3.rrd with link1in link2in
 171
 172 Suppose you would like to add up all these counters, except for link2in
 173 inside router2.rrd. You need to do:
 174
 175 (in this example, "router1.rrd:link1in" means the DS link1in inside the
 176 RRD router1.rrd)
 177
 178    router1.rrd:link1in
 179    router1.rrd:link2in
 180    router2.rrd:link1in
 181    router3.rrd:link1in
 182    router3.rrd:link2in
 183    --------------------   +
 184    (outcome of the sum)
 185
 186 As a mathematical function, this could be written:
 187
 188 C<add(router1.rrd:link1in , router1.rrd:link2in , router2.rrd:link1in , router3.rrd:link1in , router3.rrd:link2.in)>
 189
 190 With RRDtool and RPN, first, define the inputs:
 191
 192    DEF:a=router1.rrd:link1in:AVERAGE
 193    DEF:b=router1.rrd:link2in:AVERAGE
 194    DEF:c=router2.rrd:link1in:AVERAGE
 195    DEF:d=router3.rrd:link1in:AVERAGE
 196    DEF:e=router3.rrd:link2in:AVERAGE
 197
 198 Now, the mathematical function becomes: C<add(a,b,c,d,e)>
 199
 200 In RPN, there's no operator that sums more than two values so you need
 201 to do several additions. You add a and b, add c to the result, add d
 202 to the result and add e to the result.
 203
 204    push a:         a     stack contains the value of a
 205    push b and add: b,+   stack contains the result of a+b
 206    push c and add: c,+   stack contains the result of a+b+c
 207    push d and add: d,+   stack contains the result of a+b+c+d
 208    push e and add: e,+   stack contains the result of a+b+c+d+e
 209
 210 What was calculated here would be written down as:
 211
 212    ( ( ( (a+b) + c) + d) + e) >
 213
 214 This is in RPN:  C<CDEF:result=a,b,+,c,+,d,+,e,+>
 215
 216 This is correct but it can be made more clear to humans. It does
 217 not matter if you add a to b and then add c to the result or first
 218 add b to c and then add a to the result. This makes it possible to
 219 rewrite the RPN into C<CDEF:result=a,b,c,d,e,+,+,+,+> which is
 220 evaluated differently:
 221
 222    push value of variable a on the stack: a
 223    push value of variable b on the stack: a b
 224    push value of variable c on the stack: a b c
 225    push value of variable d on the stack: a b c d
 226    push value of variable e on the stack: a b c d e
 227    push operator + on the stack:          a b c d e +
 228    and process it:                        a b c P   (where P == d+e)
 229    push operator + on the stack:          a b c P +
 230    and process it:                        a b Q     (where Q == c+P)
 231    push operator + on the stack:          a b Q +
 232    and process it:                        a R       (where R == b+Q)
 233    push operator + on the stack:          a R +
 234    and process it:                        S         (where S == a+R)
 235
 236 As you can see the RPN expression C<a,b,c,d,e,+,+,+,+,+> will evaluate in
 237 C<((((d+e)+c)+b)+a)> and it has the same outcome as C<a,b,+,c,+,d,+,e,+>.
 238 This is called the commutative law of addition,
 239 but you may forget this right away, as long as you remember what it
 240 means.
 241
 242 Now look at an expression that contains a multiplication:
 243
 244 First in normal math: C<let result = a+b*c>. In this case you can't
 245 choose the order yourself, you have to start with the multiplication
 246 and then add a to it. You may alter the position of b and c, you must
 247 not alter the position of a and b.
 248
 249 You have to take this in consideration when converting this expression
 250 into RPN. Read it as: "Add the outcome of b*c to a" and then it is
 251 easy to write the RPN expression: C<result=a,b,c,*,+>
 252 Another expression that would return the same: C<result=b,c,*,a,+>
 253
 254 In normal math, you may encounter something like "a*(b+c)" and this
 255 can also be converted into RPN. The parenthesis just tell you to first
 256 add b and c, and then multiply a with the result. Again, now it is
 257 easy to write it in RPN: C<result=a,b,c,+,*>. Note that this is very
 258 similar to one of the expressions in the previous paragraph, only the
 259 multiplication and the addition changed places.
 260
 261 When you have problems with RPN or when RRDtool is complaining, it's
 262 usually a good thing to write down the stack on a piece of paper
 263 and see what happens. Have the manual ready and pretend to be RRDtool.
 264 Just do all the math by hand to see what happens, I'm sure this will
 265 solve most, if not all, problems you encounter.
 266
 267 =head1 Some special numbers
 268
 269 =head2 The unknown value
 270
 271 Sometimes collecting your data will fail. This can be very common,
 272 especially when querying over busy links. RRDtool can be configured
 273 to allow for one (or even more) unknown value(s) and calculate the missing
 274 update. You can, for instance, query your device every minute. This is
 275 creating one so called PDP or primary data point per minute. If you
 276 defined your RRD to contain an RRA that stores 5-minute values, you need
 277 five of those PDPs to create one CDP (consolidated data point).
 278 These PDPs can become unknown in two cases:
 279
 280 =over 4
 281
 282 =item 1.
 283
 284 The updates are too far apart. This is tuned using the "heartbeat" setting.
 285
 286 =item 2.
 287
 288 The update was set to unknown on purpose by inserting no value (using the
 289 template option) or by using "U" as the value to insert.
 290
 291 =back
 292
 293 When a CDP is calculated, another mechanism determines if this CDP is valid
 294 or not. If there are too many PDPs unknown, the CDP is unknown as well.
 295 This is determined by the xff factor. Please note that one unknown counter
 296 update can result in two unknown PDPs! If you only allow for one unknown
 297 PDP per CDP, this makes the CDP go unknown!
 298
 299 Suppose the counter increments with one per second and you retrieve it
 300 every minute:
 301
 302    counter value    resulting rate
 303    10'000
 304    10'060            1; (10'060-10'000)/60 == 1
 305    10'120            1; (10'120-10'060)/60 == 1
 306    unknown           unknown; you don't know the last value
 307    10'240            unknown; you don't know the previous value
 308    10'300            1; (10'300-10'240)/60 == 1
 309
 310 If the CDP was to be calculated from the last five updates, it would get
 311 two unknown PDPs and three known PDPs. If xff would have been set to 0.5
 312 which by the way is a commonly used factor, the CDP would have a known
 313 value of 1. If xff would have been set to 0.2 then the resulting CDP
 314 would be unknown.
 315
 316 You have to decide the proper values for heartbeat, number of PDPs per
 317 CDP and the xff factor. As you can see from the previous text they define
 318 the behavior of your RRA.
 319
 320 =head2 Working with unknown data in your database
 321
 322 As you have read in the previous chapter, entries in an RRA can be
 323 set to the unknown value. If you do calculations with this type of
 324 value, the result has to be unknown too. This means that an expression
 325 such as C<result=a,b,+> will be unknown if either a or b is unknown.
 326 It would be wrong to just ignore the unknown value and return the value
 327 of the other parameter. By doing so, you would assume "unknown" means "zero"
 328 and this is not true.
 329
 330 There has been a case where somebody was collecting data for over a year.
 331 A new piece of equipment was installed, a new RRD was created and the
 332 scripts were changed to add a counter from the old database and a counter
 333 from the new database. The result was disappointing, a large part of
 334 the statistics seemed to have vanished mysteriously ...
 335 They of course didn't, values from the old database (known values) were
 336 added to values from the new database (unknown values) and the result was
 337 unknown.
 338
 339 In this case, it is fairly reasonable to use a CDEF that alters unknown
 340 data into zero. The counters of the device were unknown (after all, it
 341 wasn't installed yet!) but you know that the data rate through the device
 342 had to be zero (because of the same reason: it was not installed).
 343
 344 There are some examples below that make this change.
 345
 346 =head2 Infinity
 347
 348 Infinite data is another form of a special number. It cannot be
 349 graphed because by definition you would never reach the infinite
 350 value. You can think of positive and negative infinity depending on
 351 the position relative to zero.
 352
 353 RRDtool is capable of representing (-not- graphing!) infinity by stopping
 354 at its current maximum (for positive infinity) or minimum (for negative
 355 infinity) without knowing this maximum (minimum).
 356
 357 Infinity in RRDtool is mostly used to draw an AREA without knowing its
 358 vertical dimensions. You can think of it as drawing an AREA with an
 359 infinite height and displaying only the part that is visible in the
 360 current graph. This is probably a good way to approximate infinity
 361 and it sure allows for some neat tricks. See below for examples.
 362
 363 =head2 Working with unknown data and infinity
 364
 365 Sometimes you would like to discard unknown data and pretend it is zero
 366 (or any other value for that matter) and sometimes you would like to
 367 pretend that known data is unknown (to discard known-to-be-wrong data).
 368 This is why CDEFs have support for unknown data. There are also examples
 369 available that show unknown data by using infinity.
 370
 371 =head1 Some examples
 372
 373 =head2 Example: using a recently created RRD
 374
 375 You are keeping statistics on your router for over a year now. Recently
 376 you installed an extra router and you would like to show the combined
 377 throughput for these two devices.
 378
 379 If you just add up the counters from router.rrd and router2.rrd, you
 380 will add known data (from router.rrd) to unknown data (from router2.rrd) for
 381 the bigger part of your stats. You could solve this in a few ways:
 382
 383 =over 4
 384
 385 =item *
 386
 387 While creating the new database, fill it with zeros from the start to now.
 388 You have to make the database start at or before the least recent time in
 389 the other database.
 390
 391 =item *
 392
 393 Alternatively, you could use CDEF and alter unknown data to zero.
 394
 395 =back
 396
 397 Both methods have their pros and cons. The first method is troublesome and
 398 if you want to do that you have to figure it out yourself. It is not
 399 possible to create a database filled with zeros, you have to put them in
 400 manually. Implementing the second method is described next:
 401
 402 What we want is: "if the value is unknown, replace it with zero". This
 403 could be written in pseudo-code as:  if (value is unknown) then (zero)
 404 else (value). When reading the L<rrdgraph> manual you notice the "UN"
 405 function that returns zero or one. You also notice the "IF" function
 406 that takes zero or one as input.
 407
 408 First look at the "IF" function. It takes three values from the stack,
 409 the first value is the decision point, the second value is returned to
 410 the stack if the evaluation is "true" and if not, the third value is
 411 returned to the stack. We want the "UN" function to decide what happens
 412 so we combine those two functions in one CDEF.
 413
 414 Lets write down the two possible paths for the "IF" function:
 415
 416    if true  return a
 417    if false return b
 418
 419 In RPN:  C<result=x,a,b,IF> where "x" is either true or false.
 420
 421 Now we have to fill in "x", this should be the "(value is unknown)" part
 422 and this is in RPN:  C<result=value,UN>
 423
 424 We now combine them: C<result=value,UN,a,b,IF> and when we fill in the
 425 appropriate things for "a" and "b" we're finished:
 426
 427 C<CDEF:result=value,UN,0,value,IF>
 428
 429 You may want to read Steve Rader's RPN guide if you have difficulties
 430 with the way I explained this last example.
 431
 432 If you want to check this RPN expression, just mimic RRDtool behavior:
 433
 434    For any known value, the expression evaluates as follows:
 435    CDEF:result=value,UN,0,value,IF  (value,UN) is not true so it becomes 0
 436    CDEF:result=0,0,value,IF         "IF" will return the 3rd value
 437    CDEF:result=value                The known value is returned
 438
 439    For the unknown value, this happens:
 440    CDEF:result=value,UN,0,value,IF  (value,UN) is true so it becomes 1
 441    CDEF:result=1,0,value,IF         "IF" sees 1 and returns the 2nd value
 442    CDEF:result=0                    Zero is returned
 443
 444 Of course, if you would like to see another value instead of zero, you
 445 can use that other value.
 446
 447 Eventually, when all unknown data is removed from the RRD, you may want
 448 to remove this rule so that unknown data is properly displayed.
 449
 450 =head2 Example: better handling of unknown data, by using time
 451
 452 The above example has one drawback. If you do log unknown data in
 453 your database after installing your new equipment, it will also be
 454 translated into zero and therefore you won't see that there was a
 455 problem. This is not good and what you really want to do is:
 456
 457 =over 4
 458
 459 =item *
 460
 461 If there is unknown data, look at the time that this sample was taken.
 462
 463 =item *
 464
 465 If the unknown value is before time xxx, make it zero.
 466
 467 =item *
 468
 469 If it is after time xxx, leave it as unknown data.
 470
 471 =back
 472
 473 This is doable: you can compare the time that the sample was taken
 474 to some known time. Assuming you started to monitor your device on
 475 Friday September 17, 1999, 00:35:57 MET DST. Translate this time in seconds
 476 since 1970-01-01 and it becomes 937'521'357. If you process unknown values
 477 that were received after this time, you want to leave them unknown and
 478 if they were "received" before this time, you want to translate them
 479 into zero (so you can effectively ignore them while adding them to your
 480 other routers counters).
 481
 482 Translating Friday September 17, 1999, 00:35:57 MET DST into 937'521'357 can
 483 be done by, for instance, using gnu date:
 484
 485    date -d "19990917 00:35:57" +%s
 486
 487 You could also dump the database and see where the data starts to be
 488 known. There are several other ways of doing this, just pick one.
 489
 490 Now we have to create the magic that allows us to process unknown
 491 values different depending on the time that the sample was taken.
 492 This is a three step process:
 493
 494 =over 4
 495
 496 =item 1.
 497
 498 If the timestamp of the value is after 937'521'357, leave it as is.
 499
 500 =item 2.
 501
 502 If the value is a known value, leave it as is.
 503
 504 =item 3.
 505
 506 Change the unknown value into zero.
 507
 508 =back
 509
 510 Lets look at part one:
 511
 512     if (true) return the original value
 513
 514 We rewrite this:
 515
 516     if (true) return "a"
 517     if (false) return "b"
 518
 519 We need to calculate true or false from step 1. There is a function
 520 available that returns the timestamp for the current sample. It is
 521 called, how surprisingly, "TIME". This time has to be compared to
 522 a constant number, we need "GT". The output of "GT" is true or false
 523 and this is good input to "IF". We want "if (time > 937521357) then
 524 (return a) else (return b)".
 525
 526 This process was already described thoroughly in the previous chapter
 527 so lets do it quick:
 528
 529    if (x) then a else b
 530       where x represents "time>937521357"
 531       where a represents the original value
 532       where b represents the outcome of the previous example
 533
 534    time>937521357       --> TIME,937521357,GT
 535
 536    if (x) then a else b --> x,a,b,IF
 537    substitute x         --> TIME,937521357,GT,a,b,IF
 538    substitute a         --> TIME,937521357,GT,value,b,IF
 539    substitute b         --> TIME,937521357,GT,value,value,UN,0,value,IF,IF
 540
 541 We end up with:
 542 C<CDEF:result=TIME,937521357,GT,value,value,UN,0,value,IF,IF>
 543
 544 This looks very complex, however, as you can see, it was not too hard to
 545 come up with.
 546
 547 =head2 Example: Pretending weird data isn't there
 548
 549 Suppose you have a problem that shows up as huge spikes in your graph.
 550 You know this happens and why, so you decide to work around the problem.
 551 Perhaps you're using your network to do a backup at night and by doing
 552 so you get almost 10mb/s while the rest of your network activity does
 553 not produce numbers higher than 100kb/s.
 554
 555 There are two options:
 556
 557 =over 4
 558
 559 =item 1.
 560
 561 If the number exceeds 100kb/s it is wrong and you want it masked out
 562 by changing it into unknown.
 563
 564 =item 2.
 565
 566 You don't want the graph to show more than 100kb/s.
 567
 568 =back
 569
 570 Pseudo code: if (number > 100) then unknown else number
 571 or
 572 Pseudo code: if (number > 100) then 100 else number.
 573
 574 The second "problem" may also be solved by using the rigid option of
 575 RRDtool graph, however this has not the same result. In this example
 576 you can end up with a graph that does autoscaling. Also, if you use
 577 the numbers to display maxima they will be set to 100kb/s.
 578
 579 We use "IF" and "GT" again. "if (x) then (y) else (z)" is written
 580 down as "CDEF:result=x,y,z,IF"; now fill in x, y and z.
 581 For x you fill in "number greater than 100kb/s" becoming
 582 "number,100000,GT" (kilo is 1'000 and b/s is what we measure!).
 583 The "z" part is "number" in both cases and the "y" part is either
 584 "UNKN" for unknown or "100000" for 100kb/s.
 585
 586 The two CDEF expressions would be:
 587
 588     CDEF:result=number,100000,GT,UNKN,number,IF
 589     CDEF:result=number,100000,GT,100000,number,IF
 590
 591 =head2 Example: working on a certain time span
 592
 593 If you want a graph that spans a few weeks, but would only want to
 594 see some routers' data for one week, you need to "hide" the rest of
 595 the time frame. Don't ask me when this would be useful, it's just
 596 here for the example :)
 597
 598 We need to compare the time stamp to a begin date and an end date.
 599 Comparing isn't difficult:
 600
 601         TIME,begintime,GE
 602         TIME,endtime,LE
 603
 604 These two parts of the CDEF produce either 0 for false or 1 for true.
 605 We can now check if they are both 0 (or 1) using a few IF statements
 606 but, as Wataru Satoh pointed out, we can use the "*" or "+" functions
 607 as logical AND and logical OR.
 608
 609 For "*", the result will be zero (false) if either one of the two
 610 operators is zero.  For "+", the result will only be false (0) when
 611 two false (0) operators will be added.  Warning: *any* number not
 612 equal to 0 will be considered "true". This means that, for instance,
 613 "-1,1,+" (which should be "true or true") will become FALSE ...
 614 In other words, use "+" only if you know for sure that you have positive
 615 numbers (or zero) only.
 616
 617 Let's compile the complete CDEF:
 618
 619         DEF:ds0=router1.rrd:AVERAGE
 620         CDEF:ds0modified=TIME,begintime,GT,TIME,endtime,LE,*,ds0,UNKN,IF
 621
 622 This will return the value of ds0 if both comparisons return true. You
 623 could also do it the other way around:
 624
 625         DEF:ds0=router1.rrd:AVERAGE
 626         CDEF:ds0modified=TIME,begintime,LT,TIME,endtime,GT,+,UNKN,ds0,IF
 627
 628 This will return an UNKNOWN if either comparison returns true.
 629
 630 =head2 Example: You suspect to have problems and want to see unknown data.
 631
 632 Suppose you add up the number of active users on several terminal servers.
 633 If one of them doesn't give an answer (or an incorrect one) you get "NaN"
 634 in the database ("Not a Number") and NaN is evaluated as Unknown.
 635
 636 In this case, you would like to be alerted to it and the sum of the
 637 remaining values is of no value to you.
 638
 639 It would be something like:
 640
 641     DEF:users1=location1.rrd:onlineTS1:LAST
 642     DEF:users2=location1.rrd:onlineTS2:LAST
 643     DEF:users3=location2.rrd:onlineTS1:LAST
 644     DEF:users4=location2.rrd:onlineTS2:LAST
 645     CDEF:allusers=users1,users2,users3,users4,+,+,+
 646
 647 If you now plot allusers, unknown data in one of users1..users4 will
 648 show up as a gap in your graph. You want to modify this to show a
 649 bright red line, not a gap.
 650
 651 Define an extra CDEF that is unknown if all is okay and is infinite if
 652 there is an unknown value:
 653
 654     CDEF:wrongdata=allusers,UN,INF,UNKN,IF
 655
 656 "allusers,UN" will evaluate to either true or false, it is the (x) part
 657 of the "IF" function and it checks if allusers is unknown.
 658 The (y) part of the "IF" function is set to "INF" (which means infinity)
 659 and the (z) part of the function returns "UNKN".
 660
 661 The logic is: if (allusers == unknown) then return INF else return UNKN.
 662
 663 You can now use AREA to display this "wrongdata" in bright red. If it
 664 is unknown (because allusers is known) then the red AREA won't show up.
 665 If the value is INF (because allusers is unknown) then the red AREA will
 666 be filled in on the graph at that particular time.
 667
 668    AREA:allusers#0000FF:combined user count
 669    AREA:wrongdata#FF0000:unknown data
 670
 671 =head2 Same example useful with STACKed data:
 672
 673 If you use stack in the previous example (as I would do) then you don't
 674 add up the values. Therefore, there is no relationship between the
 675 four values and you don't get a single value to test.
 676 Suppose users3 would be unknown at one point in time: users1 is plotted,
 677 users2 is stacked on top of users1, users3 is unknown and therefore
 678 nothing happens, users4 is stacked on top of users2.
 679 Add the extra CDEFs anyway and use them to overlay the "normal" graph:
 680
 681    DEF:users1=location1.rrd:onlineTS1:LAST
 682    DEF:users2=location1.rrd:onlineTS2:LAST
 683    DEF:users3=location2.rrd:onlineTS1:LAST
 684    DEF:users4=location2.rrd:onlineTS2:LAST
 685    CDEF:allusers=users1,users2,users3,users4,+,+,+
 686    CDEF:wrongdata=allusers,UN,INF,UNKN,IF
 687    AREA:users1#0000FF:users at ts1
 688    STACK:users2#00FF00:users at ts2
 689    STACK:users3#00FFFF:users at ts3
 690    STACK:users4#FFFF00:users at ts4
 691    AREA:wrongdata#FF0000:unknown data
 692
 693 If there is unknown data in one of users1..users4, the "wrongdata" AREA
 694 will be drawn and because it starts at the X-axis and has infinite height
 695 it will effectively overwrite the STACKed parts.
 696
 697 You could combine the two CDEF lines into one (we don't use "allusers")
 698 if you like.  But there are good reasons for writing two CDEFS:
 699
 700 =over 4
 701
 702 =item *
 703
 704 It improves the readability of the script.
 705
 706 =item *
 707
 708 It can be used inside GPRINT to display the total number of users.
 709
 710 =back
 711
 712 If you choose to combine them, you can substitute the "allusers" in the
 713 second CDEF with the part after the equal sign from the first line:
 714
 715    CDEF:wrongdata=users1,users2,users3,users4,+,+,+,UN,INF,UNKN,IF
 716
 717 If you do so, you won't be able to use these next GPRINTs:
 718
 719    COMMENT:"Total number of users seen"
 720    GPRINT:allusers:MAX:"Maximum: %6.0lf"
 721    GPRINT:allusers:MIN:"Minimum: %6.0lf"
 722    GPRINT:allusers:AVERAGE:"Average: %6.0lf"
 723    GPRINT:allusers:LAST:"Current: %6.0lf\n"
 724
 725 =head1 The examples from the RRD graph manual page
 726
 727 =head2 Degrees Celsius vs. Degrees Fahrenheit
 728
 729 To convert Celsius into Fahrenheit use the formula
 730 F=9/5*C+32
 731
 732    rrdtool graph demo.png --title="Demo Graph" \
 733       DEF:cel=demo.rrd:exhaust:AVERAGE \
 734       CDEF:far=9,5,/,cel,*,32,+ \
 735       LINE2:cel#00a000:"D. Celsius" \
 736       LINE2:far#ff0000:"D. Fahrenheit\c"
 737
 738 This example gets the DS called "exhaust" from database "demo.rrd"
 739 and puts the values in variable "cel". The CDEF used is evaluated
 740 as follows:
 741
 742    CDEF:far=9,5,/,cel,*,32,+
 743    1. push 9, push 5
 744    2. push function "divide" and process it
 745       the stack now contains 9/5
 746    3. push variable "cel"
 747    4. push function "multiply" and process it
 748       the stack now contains 9/5*cel
 749    5. push 32
 750    6. push function "plus" and process it
 751       the stack contains now the temperature in Fahrenheit
 752
 753 =head2 Changing unknown into zero
 754
 755    rrdtool graph demo.png --title="Demo Graph" \
 756       DEF:idat1=interface1.rrd:ds0:AVERAGE \
 757       DEF:idat2=interface2.rrd:ds0:AVERAGE \
 758       DEF:odat1=interface1.rrd:ds1:AVERAGE \
 759       DEF:odat2=interface2.rrd:ds1:AVERAGE \
 760       CDEF:agginput=idat1,UN,0,idat1,IF,idat2,UN,0,idat2,IF,+,8,* \
 761       CDEF:aggoutput=odat1,UN,0,odat1,IF,odat2,UN,0,odat2,IF,+,8,* \
 762       AREA:agginput#00cc00:Input Aggregate \
 763       LINE1:aggoutput#0000FF:Output Aggregate
 764
 765 These two CDEFs are built from several functions. It helps to split
 766 them when viewing what they do. Starting with the first CDEF we would
 767 get:
 768
 769  idat1,UN --> a
 770  0        --> b
 771  idat1    --> c
 772  if (a) then (b) else (c)
 773
 774 The result is therefore "0" if it is true that "idat1" equals "UN".
 775 If not, the original value of "idat1" is put back on the stack.
 776 Lets call this answer "d". The process is repeated for the next
 777 five items on the stack, it is done the same and will return answer
 778 "h". The resulting stack is therefore "d,h".
 779 The expression has been simplified to "d,h,+,8,*" and it will now be
 780 easy to see that we add "d" and "h", and multiply the result with eight.
 781
 782 The end result is that we have added "idat1" and "idat2" and in the
 783 process we effectively ignored unknown values. The result is multiplied
 784 by eight, most likely to convert bytes/s to bits/s.
 785
 786 =head2 Infinity demo
 787
 788    rrdtool graph example.png --title="INF demo" \
 789       DEF:val1=some.rrd:ds0:AVERAGE \
 790       DEF:val2=some.rrd:ds1:AVERAGE \
 791       DEF:val3=some.rrd:ds2:AVERAGE \
 792       DEF:val4=other.rrd:ds0:AVERAGE \
 793       CDEF:background=val4,POP,TIME,7200,%,3600,LE,INF,UNKN,IF \
 794       CDEF:wipeout=val1,val2,val3,val4,+,+,+,UN,INF,UNKN,IF \
 795       AREA:background#F0F0F0 \
 796       AREA:val1#0000FF:Value1 \
 797       STACK:val2#00C000:Value2 \
 798       STACK:val3#FFFF00:Value3 \
 799       STACK:val4#FFC000:Value4 \
 800       AREA:whipeout#FF0000:Unknown
 801
 802 This demo demonstrates two ways to use infinity. It is a bit tricky
 803 to see what happens in the "background" CDEF.
 804
 805    "val4,POP,TIME,7200,%,3600,LE,INF,UNKN,IF"
 806
 807 This RPN takes the value of "val4" as input and then immediately
 808 removes it from the stack using "POP". The stack is now empty but
 809 as a side effect we now know the time that this sample was taken.
 810 This time is put on the stack by the "TIME" function.
 811
 812 "TIME,7200,%" takes the modulo of time and 7'200 (which is two hours).
 813 The resulting value on the stack will be a number in the range from
 814 0 to 7199.
 815
 816 For people who don't know the modulo function: it is the remainder
 817 after an integer division. If you divide 16 by 3, the answer would
 818 be 5 and the remainder would be 1. So, "16,3,%" returns 1.
 819
 820 We have the result of "TIME,7200,%" on the stack, lets call this
 821 "a". The start of the RPN has become "a,3600,LE" and this checks
 822 if "a" is less or equal than "3600". It is true half of the time.
 823 We now have to process the rest of the RPN and this is only a simple
 824 "IF" function that returns either "INF" or "UNKN" depending on the
 825 time. This is returned to variable "background".
 826
 827 The second CDEF has been discussed earlier in this document so we
 828 won't do that here.
 829
 830 Now you can draw the different layers. Start with the background
 831 that is either unknown (nothing to see) or infinite (the whole
 832 positive part of the graph gets filled).
 833
 834 Next you draw the data on top of this background, it will overlay
 835 the background. Suppose one of val1..val4 would be unknown, in that
 836 case you end up with only three bars stacked on top of each other.
 837 You don't want to see this because the data is only valid when all
 838 four variables are valid. This is why you use the second CDEF, it
 839 will overlay the data with an AREA so the data cannot be seen anymore.
 840
 841 If your data can also have negative values you also need to overwrite
 842 the other half of your graph. This can be done in a relatively simple
 843 way: what you need is the "wipeout" variable and place a negative
 844 sign before it:  "CDEF:wipeout2=wipeout,-1,*"
 845
 846 =head2 Filtering data
 847
 848 You may do some complex data filtering:
 849
 850   MEDIAN FILTER: filters shot noise
 851
 852     DEF:var=database.rrd:traffic:AVERAGE
 853     CDEF:prev1=PREV(var)
 854     CDEF:prev2=PREV(prev1)
 855     CDEF:prev3=PREV(prev2)
 856     CDEF:median=prev1,prev2,prev3,+,+,3,/
 857     LINE3:median#000077:filtered
 858     LINE1:prev2#007700:'raw data'
 859
 860
 861   DERIVATE:
 862
 863     DEF:var=database.rrd:traffic:AVERAGE
 864     CDEF:prev1=PREV(var)
 865     CDEF:time=var,POP,TIME
 866     CDEF:prevtime=PREV(time)
 867     CDEF:derivate=var,prev1,-,time,prevtime,-,/
 868     LINE3:derivate#000077:derivate
 869     LINE1:var#007700:'raw data'
 870
 871
 872 =head1 Out of ideas for now
 873
 874 This document was created from questions asked by either myself or by
 875 other people on the RRDtool mailing list. Please let me know if you
 876 find errors in it or if you have trouble understanding it. If you
 877 think there should be an addition, mail me:
 878 E<lt>alex@vandenbogaerdt.nlE<gt>
 879
 880 Remember: B<No feedback equals no changes!>
 881
 882 =head1 SEE ALSO
 883
 884 The RRDtool manpages
 885
 886 =head1 AUTHOR
 887
 888 Alex van den Bogaerdt
 889 E<lt>alex@vandenbogaerdt.nlE<gt>