Facebook open source oomd: A new way to handle out-of-memory issues
Recently, on Facebook’s website, the company’s Daniel Xu announced open source oomd under the GPLv2 license. oomd is userspace Out-Of-Memory (OOM) killer for Linux systems, which mentioned in a recent article on block I/O latency controllers. When there is not enough memory, the Out-Of-Memory killer will kill some processes; its primary task is to protect the kernel so that the application may be affected.
Compared to traditional Linux Out-Of-Memory killers, oomd comprehensively monitors the system to assess whether the system is under unrecoverable workloads. Before the system’s OOM Killer role, oomd takes corrective action in userspace.
Facebook says its infrastructure has evolved to include news streams, Messenger, Instagram, WhatsApp, Oculus and a host of other products. These products and the systems behind them run on millions of servers distributed across multiple geographically distributed data centres. As the scale of the infrastructure continues to expand, Facebook’s machines and networks are increasingly spanning numerous generations. A side effect of this multi-generation production environment is that new software versions or configuration changes may cause the system to function correctly on a single computer. An Out-Of-Memory (OOM) problem was encountered on another computer. Traditional Linux Out-Of-Memory killer in some cases works well, but in other cases, it starts too late, causing the system to enter the uncertain times of livelock.
So Facebook developed oomd, a faster, more reliable solution for common Out-Of-Memory (OOM) situations, which can run in user space rather than kernel space. The oomd designed by Facebook contains two key features: pre-OOM hooks and custom plugin systems. The pre-OOM hook provides visibility into OOM before the workload is threatened. Because OOM detection standards can vary by workload, plug-in systems support customisation of detection and process termination policies.
Source, Image: code.fb.com