Time-Series Anomaly Detection: A Decade Review

(arxiv.org)

430 points | by belter 1 day ago

20 comments

  • bluechair 1 day ago
    Didn’t see it mentioned but good to know about: UCR matrix profile.

    The Matrix Profile is honestly one of the most underrated tools in the time series analysis space - it's ridiculously efficient. The killer feature is how it just works for finding motifs and anomalies without having to mess around with window sizes and thresholds like you do with traditional techniques. Solid across domains too, from manufacturing sensor data to ECG analysis to earthquake detection.

    https://www.cs.ucr.edu/~eamonn/MatrixProfile.html

    • jmpeax 23 hours ago
      What do you mean you don't have to mess around with window sizes? Matrix profile is highly dependent on the window size.
      • eamonnkeogh 6 hours ago
        The MP is so efficent that you can test ALL window lengths at once! This is called MADRID [a].

        [a] Matrix Profile XXX: MADRID: A Hyper-Anytime Algorithm to Find Time Series Anomalies of all Lengths. Yue Lu, Thirumalai Vinjamoor Akhil Srinivas, Takaaki Nakamura, Makoto Imamura, and Eamonn Keogh. ICDM 2023.

    • eamonnkeogh 6 hours ago
      Thank you for your kind words ;-)
    • sriram_malhar 18 hours ago
      Thanks for sharing; I am most intrigued by the sales pitch. But the website is downright ugly.

      This is a better presentation by the same folks. https://matrixprofile.org/

      • Croftengea 14 hours ago
        I don't think it's being updated. Latest blog posts are from 2020, and Github repos haven't seen commits for the last 5-6 years. MP went a long way since then.
      • eskaytwo 17 hours ago
        I don’t think it’s the same people.
        • sriram_malhar 16 hours ago
          Ah, you are right. I got the link from the original URL, so I just assumed. Thanks for the correction.
          • willvarfar 15 hours ago
            What is the relationship? And why is there a foundation for something like this?
      • hoseja 16 hours ago
        Are you being serious? The first page actually has information on it. You can add margins in the devtools.
    • Croftengea 14 hours ago
      MP is one of the best univariate methods, but it's actually mentioned in the article.
    • bee_rider 1 day ago
      What does it do? Anything to do with matrices, like, from math?
  • quijoteuniv 1 day ago
    I use offset function in Prometheus to make an average of past weeks as a recording rule. We have a use in our systems that is very "seasonal" as in weekly cycles so I make an average of some metric (offset 1 week, 2 week, 3 week , 4 week/4) and I compare it to the current value of that metric. That way the alarms can be set day or night, weekday or weekend, and the thresholds are dynamic. It compares against an average of the day of the week, or time of the day. There is someone in Gitlab that posted a more in depth explanation of this way of working. https://about.gitlab.com/blog/2019/07/23/anomaly-detection-u... Things get a bit more complicated with holidays, but you can actually programm them into prometheus https://promcon.io/2019-munich/slides/improved-alerting-with...
    • gr3ml1n 1 day ago
      Whenever I have a chart in Grafana that isn't too dense, I almost always add a line for the 7d offset value. Super useful to tell what's normal and what isn't.
    • CubsFan1060 1 day ago
      Gitlab also has this: https://gitlab.com/gitlab-com/gl-infra/tamland

      I'm not really smart in these areas, but it feels like forecasting and anomaly detection are pretty related. I could be wrong though.

      • diab0lic 1 day ago
        You are not wrong! An entire subclass of anomaly detection can basically be reduced to: forecast the next data point and then measure the forecast error when the data point arrives.
        • fnordpiglet 1 day ago
          Well it doesn’t really require a forecast - variance based anomaly detection doesn’t make an assertion of the next point but that its maximum change is within some band. Such models usually can’t be used to make a forecast other than the banding bounds.
          • diab0lic 1 day ago
            That would be a different subclass of anomaly detection solutions.
      • fraserphysics 8 hours ago
        If you need to detect anomalies as soon as they occur, that seems right. But if you want to detect them later you can also combine back-casting with forecasting. Like Kalman smoothing.
  • mikehollinger 1 day ago
    This doesn’t capture work that’s happened in the last year or so.

    For example some former colleagues timeseries foundation model (Granite TS) which was doing pretty well when we were experimenting with it. [1]

    An aha moment for me was realizing that the way you can think of anomaly models working is that they’re effectively forecasting the next N steps, and then noticing when the actual measured values are “different enough” from the expected. This is simple to draw on a whiteboard for one signal but when it’s multi variate, pretty neat that it works.

    [1] https://huggingface.co/ibm-granite/granite-timeseries-ttm-r1

    • 0cf8612b2e1e 1 day ago
      My similar recognition was when I read about isolation forests for outlier detection[0]. When predictions are different from the average, something is off.

      [0] https://scikit-learn.org/stable/modules/generated/sklearn.en...

    • tessierashpool9 1 day ago
      what were you thinking then before your aha moment? :D
      • mikehollinger 1 day ago
        > what were you thinking then before your aha moment? :D

        My naive view was that there was some sort of “normalization” or “pattern matching” that was happening. Like - you can look at a trend line that generally has some shape, and notice when something changes or there’s a discontinuity. That’s a very simplistic view - but - I assumed that stuff was trying to do regressions and notice when something was out of a statistical norm like k-means analysis. Which works, sort of, but is difficult to generalize.

        • tessierashpool9 12 hours ago
          > Like - you can look at a trend line that generally has some shape, and notice when something changes or there’s a discontinuity.

          what you describe here is effectively forecasting a model of what is expected to happen and then you notice a deviation from it.

          • naijaboiler 10 hours ago
            to me its always amazing how people look at whats evidently obvious to me and say its profound.
            • tessierashpool9 10 hours ago
              especially if they are self-assessed "distinguished engineers and master inventors"
    • apwheele 1 day ago
      Care to share the contexts in which someone needs a zero-shot model for time series? I have just never come across one in which you don't have some historical data to fit a model and go from there.
      • delusional 1 day ago
        In this case I don't think zero-shot means no context. I think it's more used in relation to fine-tuning the model parameters over your data.

        > TTM-1 currently supports 2 modes:

        > Zeroshot forecasting: Directly apply the pre-trained model on your target data to get an initial forecast (with no training).

        > Finetuned forecasting: Finetune the pre-trained model with a subset of your target data to further improve the forecast

  • Dowwie 1 day ago
    In the nascent world of water tech are IOT devices that monitor water flow. These devices can detect leaks and estimate fixture-level water consumption. Leak detection is all about identifying time series outliers. The distribution-based anomaly detection mentioned in the paper is relevant for leak detection. Interestingly, a residence may require multiple distributions due to pipe temperature variations between warm and cold seasons.
  • montereynack 1 day ago
    Gonna throw in my hat here, time series anomaly detection for industrial machinery is the problem my startup is working on! Specifically we’re making it work offline-by-default (we integrate the AI with the equipment, and don’t send data to any third party servers - even ours) because we feel there’s a ton of customer opportunities that get left in the dust because they can’t be online. If you or someone you know is looking for a monitoring solution for industrial machinery, or are passionate about security-conscious industrial software (we also are developing a data historian) let’s talk! www.sentineldevices.com
  • zaporozhets 1 day ago
    I recently tried to homebrew some anomaly detection work for a performance tracking project and was surprised at the absence of any off-the-shelf OSS or Paid solutions in this space (that weren’t super basic or way too complex). Lots of fertile ground here!
    • rad_gruchalski 1 day ago
      There's a ton of material related to anomaly detection with Prometheus and Grafana stack: https://grafana.com/blog/2024/10/03/how-to-use-prometheus-to.... But maybe this is the "way too complex" case you mention.
    • CubsFan1060 1 day ago
      I'm still playing around with this one: https://grafana.com/blog/2024/10/03/how-to-use-prometheus-to... (there's a github repo for it).

      So far, it's not terrible, but has some pretty big flaws.

      • jcreixell 1 day ago
        Hi, co-author of the blog post here. I would love to learn more about the flaws you see and if ideas on how to improve it! We definitely plan to iterate on it and make it as good as we possibly can.
        • CubsFan1060 1 day ago
          To be clear "some big flaws" was probably overstating it. I'm going to edit that. Also, thanks for the work on this. I would absolutely love to contribute, but my maths are not good enough for this :)

          The biggest thing I've run into in my testing is that an anomaly of reasonably short timeframe seems to throw the upper and lower bands off for quite some time.

          That being said, perhaps changing some of the variables would help with that, and I just don't have enough skill to be able to understand the exact way to adjust that.

          • jcreixell 12 hours ago
            Thank you for the feedback! There is an issue (https://github.com/grafana/promql-anomaly-detection/issues/7) discussing approaches to improve this by introducing a decay function. This is specially relevant for stable series with occasional short large spikes. Nothing conclusive yet, but hopefully something good will come out of it!
        • pnathan 1 day ago
          The number of manual tweaks required to the approach suggest that it is essentially an ad hoc experimental fitting, rather than a stable theoretical model that can adapt to your time series.
        • nyrikki 1 day ago
          Not really related to the above post, but one thing I am not seeing on an initial pass is the advancement of understanding of problems like riddled or wada basins.

          Especially with time delays this and 3+ attractors this can be problematic.

          A simple example:

          https://doi.org/10.21203/rs.3.rs-1088857/v1

          There are tools to try and detect these features that were found over the past few decades, and I know I wasted a few years on a project that superficially looked like a FP issue, but ended up being a mix of the wada property and/or porous sets.

          The complications will describing these worse than traditional chaos indeterminate situations may make it inappropriate for you.

          But it would be nice if visibility was increased. Funny enough most LLMs corpus is mostly fed from a LSAT question.

          There has been a lot of movement here when you have n>=3 attractors/exits.

          Not solutions unfortunately, but tools to help figure out when you hit it.

      • hackernewds 18 hours ago
        anything in grafana is inherently not exportable to any code though which is rather annoying cuz their UI really sucks
        • davkal 13 hours ago
          Hi! I work on the Grafana OSS team. We added some more export options recently (dashboards have a big Export button at the top right; panels can export their data via the panel menu / Inspect / Data), try it on our demo page: https://play.grafana.org/d/000000003/graphite3a-sample-websi...

          Could you describe your use case around "exportable to any code" a bit more?

    • ramon156 1 day ago
      I needed a TS anomaly detection for my internship because we needed to track when a machine/server was doing poorly or had unplanned downtime. I expected Microsoft's C# library to be able to do this, but my god, it's a mess. If someone has the time and will to implement a proper library then that would ve awesome.
      • mr_toad 6 hours ago
        What you’re probably after is called statistical process control. There are Python libraries like pyspc, but the theory is simple enough that you could write your own pretty easily.
      • neonsunset 1 day ago
        Anomaly detection in time-series data is not a concern of the standard library of all things. Nor is it a concern of "base abstractions" shipped as extensions (think ILogger).
        • Phurist 1 day ago
          If only life was as simple as calling .isAnomaly() on anything
          • neonsunset 1 day ago
            Hardcoded to return 'false' of course. Because nothing ever happens!
            • sam_bristow 16 hours ago
              Just advertise it as guaranteed 0% false positive detections and you're good-to-go!
    • jeffbee 1 day ago
      The reason there are not off-the-shelf solutions is this is an unsolved problem. There is no approach that is generally useful.
      • otterley 6 hours ago
        Perhaps not, but an efficient, multi-language library of different functions would allow for relatively easy implementation and experimentation.
    • phirschybar 1 day ago
      agreed. at my company we ended up rolling our own system. but this area is absolutely ripe for some configurable saas or OS tool with advanced reporting and alerting mechanisms. Datadog has a decent offering, but it's pretty $$$$.
      • montereynack 1 day ago
        Gonna throw in my hat and say that if you’re working on industrial applications (like energy or manufacturing) give us a holler at www.sentineldevices.com! Plug-and-play time series monitoring for industrial applications is exactly what we do.
    • hackernewds 18 hours ago
      there's always prophet. forecast the next value and look at the difference
  • jorl17 1 day ago
    I have a soft spot for this area. Almost 10 years ago, my Masters touched on something somewhat adjacent to this (Online Failure Prediction): https://estudogeral.uc.pt/handle/10316/99218

    We built a system to detect exceptions before they happened, and act on them, hoping that this would be better than letting them happen (e.g. preemptively slow down the rate of requests instead of leading to database exhaustion)

    At the time, I felt that there was soooooooo much to do in the area, and I'm kinda sad I never worked on it again.

  • djoldman 1 day ago
    > Unfortunately, inherent complexities in the data generation of these processes, combined with imperfections in the measurement systems as well as interactions with malicious actors, often result in abnormal phenomena. Such abnormal events appear subsequently in the collected data as anomalies.

    This is critical; and difficult to deal with in many instances.

    > With the term anomalies we refer to data points or groups of data points that do not conform to some notion of normality or an expected behavior based on previously observed data.

    This is a key problem or perhaps the problem: rigorously or precisely defining what an anomaly is and is not.

  • hazrmard 1 day ago
    Anomaly detection (AD) can arguably be a value-add to any industry. It may not be a core product, but AD can help optimize operations for almost anyone.

    * Manufacturing: Computer vision to pick anomalies off the assembly line.

    * Operation: Accelerometers/temperature sensors w/ frequency analysis to detect onset of faults (prognostics / diagnostics) and do predictive maintenance.

    * Sales: Timeseries analyses on numbers / support calls to detect up/downticks in cashflows, customer satisfaction etc.

  • Imanari 1 day ago
    Look up Eamonn Keogh, he has lots of interesting work on TSAD.
  • mlepath 19 hours ago
    The process-centric taxonomy in this paper is one of the most structured frameworks I’ve seen for anomaly detection methods. It breaks down approaches into distance-based, density-based, and prediction-based categories. In practice (been doing time series analysis professionally for 8+ years), I’ve found that prediction-based methods (e.g., reconstruction errors in autoencoders) are fantastic for semi-supervised use cases but fall short for streaming data.
  • itissid 1 day ago
    Can someone explain to me how are SVMs are being classified in this paper as "Distribution-Based"? This is quite confusing as a taxonomy. They generaly don't estimate model free densities(kernel density estimates) or model based(separating one or more possibly overlapping normal distributions).

    I get that they could be explicitly modeling a data generating process's probabilty itself(just like a NN) like of a Bernoulli(whose ML function is X-Entropy) or a Normal(ML function Mean Square loss), but I don't think that is what the author meant by a Distribution .

    My understandin is that they don't make distributional assumption on the random variable(your Y or X) they are trying to find a max margin for.

  • lmc 8 hours ago
    It would be useful to see some discussion of sampling regularity, e.g., whether some of these methods can be used with unevenly spaced time series. I work with satellite image time series, and clouds mean my usually-weekly series can sometimes be missing points for months. We often employ interpolation, but that can be a major source of error.
  • leeoniya 1 day ago
    a colleague is doing a FOSDEM 2025 talk about https://github.com/grafana/augurs
    • eskaytwo 17 hours ago
      Thanks for the pointer. Augurs looks really promising. If the matrix profile method was included it would be a nice alternative to the numba jit methods that are commonplace.
  • mathewshen 1 day ago
    Very surprised that I can see this paper here and it deserved! I start fellow the work of Dr. Boniol since 2021(By the series2graph paper). The Series2Graph is an very good algorithm that works well in some complex situations. And his later works like New Trends in Time-Series Anomaly Detection, TSB-UAD, Theseus and k-Graph and so on are very insightful too.
    • mathewshen 1 day ago
      If you want to see more algorithms/systems that are used in industry company like Twitter/Microsoft/Amazon/LinkedIn/IBM/..., you can see my note here(The source page is in Chinese, and I just translate it into English using Google Translate): https://datahonor-com.translate.goog/odyssey/aiops/tsad/pape...
    • countzro 1 day ago
      I also liked the main idea of Series2Graph but found the implementation a bit complicated.

      There is a similar algorithm with a simpler implementation in this paper: „GraphTS: Graph-represented time series for subsequence anomaly detection“ https://pmc.ncbi.nlm.nih.gov/articles/PMC10431630/

      The approach is for univariate time series and I found it to perform well (with very minor tweaks).

      • mathewshen 22 hours ago
        Agree. And thanks for the recommendation, I'd like to read the paper of GraphTS.
  • whatever1 1 day ago
    Sometimes HN just reads my mind. This was exactly the topic I was looking into this week.
  • brainwipe 1 day ago
    Wonderful! My PhD was in stream anomaly detection using dynamic neural networks in 2003. Can't wait to go deep through this paper and find out what the latest thinking is. Thanks, OP.
  • eth0up 1 day ago
    I had not known of Time Series (or most other) anomaly detection methods until recently, when I used several LLMs to assist with an analysis of the Florida Lottery Pick4 history.

    For years, I'd been casually observing the daily numbers (2 draws daily for each since around ?/?/2004?, and 1 prior), which are Pick2, 3, 4, and 5, but mostly Pick4, which is 4 digits, thus has 1:10,000 odds, vs 1:100, 1:1000 and 1:100,000 for the others.

    With truly random numbers, it is pretty difficult to identify anything but glaring anomalies. Among some of the tests performed were: (clusters\daily\weekly; (isolated forest; (popular permutations\by date\special holidays\etc; (individual digits\deviations; (temporal frequency; (dbscan; (zscore; (patterns; (correlation; (external factors; (auto correlation by cluster; (predictive modeling; (chi squared; (Time Series ... and a few more I've forgotten.

    For those wondering why I'd do this, around 2023-23, the FL Lottery drastically modified their website. Previously, one could enter a number for the game of their choice and receive all historical permutations of that number over all years, going back to the 1990s. With the new modification, the permutations have been eliminated and the history only shows for 2 years. The only option for the complete history is to download the provided PDF -- however, it is full of extraneous characters and cannot be readily searched via Ctrl-F, etc. Processing this PDF involves extensive character removal to render it parsable or modestly readable. So to restore the previously functional search ability, manual work is required. The seemingly deliberate obfuscation, or obstruction, was my motivation. The perceived anomalies over the years were secondary, as I am capable of little more than speculation without proper testing. But those two factors intrigued me.

    Having no background in math and only feeble abilities in programming, this was a task that I could not have performed without LLMs and python code used for the tests. The test is still incomplete, having increased in complexity as I progressed and left me too tired to persist. The result were ultimately within acceptable ranges of randomness, but some patterns were present. I had made files of (all numbers that ever occurred; (all numbers that have never occurred; (popularity of single, isolated digits -- I was actually correct in my intuition here, which proved certain single were occurring with lesser or greater frequencies as would be expected; (a script to apply Optical Character Recognition on the website and append the latest results to a living text and PDF file to offer anyone interested an opportunity to freely search, parse and analyze the numbers. But I couldn't quite wangle the OCR successfully.

    Working with a set over 60k individual number sets, looking for anomalies over a 30 year period; if there are other methods anyone would suggest, please offer them and I might resume this abandoned project.

    • khafra 16 hours ago
      Traditional anomaly detection is unlikely to find the signature of a bad psuedo-random number generator. You probably want something more like the NIST randomness test suite, the pyEntropy library, or mathy stuff like linear congruential generator analysis or testing for specific mersenne twisters.
      • eth0up 11 hours ago
        I'd be surprised if a duck-billed tardigrade didn't write the works of Shakespeare, unabridged, before I was able to do this. I will at least explore the concepts though.
    • ukuina 1 day ago
      You could use a Visual LLM to transcribe the PDF back into JSON data for you.

      Something like: ghostpdf to convert PDF into images, then gpt-4o or ollama+Llama3 to transcribe each image into output JSON.

      • eth0up 1 day ago
        First, let me admit I'm slow to understand, and I also may have explained the above poorly.

        The PDF is thousands of newlines, with multiple entries on each, convoluted with lots of garbage formatting. The only data to be preserved is winning num, date, evening/midday draw, fireball number (introduced in 2020-ish?) and factored along with the change from one to two daily draws (2001-ish) as acceptable anomalies in the data set, which has been done, I believe.

        The difficult part of this, actually, was cleaning the squalid pdf.

        In the end, after the work was all done, script using OCR would successfully append a number to the cleaned text/pdf, but usually not the correct num. The only reason I used OCR was that I couldn't find the right frames in the webpage that contained the latest winning numbers, and getting html extraction to to work in a script failed because of it.

        I must admit, although I have used JSON files, I don't know much about them. Additionally, I'm ignorant enough that it's probably best not to attempt to advise me too much here for sake of thread sanitation - it could get bloated and off topic :)

        I think with renewed inspiration, I could figure out a successful method to keep the public file updated, but I primarily need surefire methods of analysis upon the nums for my anomaly detection, which is a challenge for a caveman who never went to middle/high school and didn't resume school beyond 4th grade until community college much later. Of course, the fact that such an animal can twiddle with statistics and data analysis is a big testament to the positive attributes of LLMs, which without, the pursuit would be a vague thought at most.

        Although I welcome and appreciate any feedback, I'm pretty sure it isn't too welcome here. I'll try to make sense of your suggestions though.

    • sigma33 12 hours ago
      Could just use the API the web page uses and parse the JSON

      https://apim-website-prod-eastus.azure-api.net/drawgamesapp/...

      Gets you pick 4 for the 6 Jan, easy to parse.

      .... "FireballPayouts": 18060.5, "DrawNumbers": [ { "NumberPick": 3, "NumberType": "wn1" }, { "NumberPick": 0, "NumberType": "wn2" }, { "NumberPick": 0, "NumberType": "wn3" }, { "NumberPick": 4, "NumberType": "wn4" }, { "NumberPick": 1, "NumberType": "fb" } ...

      • eth0up 11 hours ago
        For now, it's over me 'ead, but I'll make more sense of it soon. Thanks
  • lebotte 1 day ago
    Time-series anomaly detection involves using techniques like forecasting and historical data offsets to dynamically identify deviations in patterns, as discussed in practical applications with tools like Prometheus and Grafana.