![]() We use the following greedy algorithm for the piecewise regression problem: Yet, that would defeat the whole point of doing a regression we wouldn’t learn anything about the general relationship between a timestamp and its associated metric value, and we couldn’t easily use the result for interpolation or extrapolation. We could always get zero error by creating a single segment for each point (or even one segment for every two points). Use the fewest number of segments that model the data well.That is, make each segment’s regression line close to the observed data points. We find ourselves trying to balance two competing goals: For our version of the problem, with unknown numbers and locations of segments, we need to compare potential solutions with different numbers of segments and different breakpoints. A greedy heuristic will be needed to quickly discard large areas of the search space.Ī more subtle challenge is that we need some way of comparing the quality of one solution to another. While dynamic programming can be used to traverse this search space much more efficiently than a naive brute-force implementation, it’s still too slow in practice. The number of ways a time series can be broken into segments is exponential in the length of the time series. The most obvious challenge to implementing a piecewise regression with automated breakpoint detection is the large size of the solution search space a brute-force search of all the possible segments is prohibitively expensive. We impose no such requirement on our algorithm. Some methods for piecewise regressions generate connected segments, where each segment’s end point is the next segment’s start point. Our algorithm must choose an appropriate number of segments without it being user-specified. However, in practice, when given an arbitrary time series, there’s no reason to believe that there must be more than one segment or less than 3 or 4 or 5. If we were to know that a data set has exactly two segments, we could easily look at each of the possible pairs of segments. Instead, we have the requirement that our piecewise regression algorithm identifies its own breakpoints.Īutomated segment count detection. In our use cases, we want to do hundreds of regressions per second, and it’s not feasible to have a human specify all breakpoints. In that case, a human can specify the breakpoint between piecewise segments, split the dataset, and perform a linear regression on each segment independently. In classical statistics literature, piecewise regression is often suggested during manual regression analysis work, where it’s obvious to the naked eye where one linear trend gives way to another. Piecewise regression can mean slightly different things in different contexts, so let’s take a minute to clarify what exactly we are trying to achieve with our piecewise regression algorithm.Īutomated breakpoint detection. This post presents Datadog’s approach to automating piecewise regression on time series data.
0 Comments
Leave a Reply. |