Actually, kernel documentation of overcommit accounting has some details: https://www.kernel.org/doc/Documentation/vm/overcommit-accounting
The Linux kernel supports the following overcommit handling modes
0 - Heuristic overcommit handling.
Obvious overcommits of address space are refused. Used for a typical system. It ensures a seriously wild allocation fails while allowing
overcommit to reduce swap usage. root is allowed to allocate slightly more memory in this mode. This is the default.
Also Documentation/sysctl/vm.txt
overcommit_memory:
This value contains a flag that enables memory overcommitment.
When this flag is 0, the kernel attempts to estimate the amount of
free memory left when userspace requests more memory...
See Documentation/vm/overcommit-accounting and
mm/mmap.c::__vm_enough_memory() for more information.
Also, man 5 proc
:
/proc/sys/vm/overcommit_memory
This file contains the kernel virtual memory accounting mode.
Values are:
0: heuristic overcommit (this is the default)
1: always overcommit, never check
2: always check, never overcommit
In mode 0, calls of mmap(2)
with MAP_NORESERVE
are not checked, and the default check is very weak, leading to the risk of getting a process "OOM-killed".
So, very huge allocations are disabled by heuristic, but sometimes application may allocate more virtual memory than size of physical memory in system, if it does not use all of it. With MAP_NORESERVE
amount of mmapable memory may be higher.
The setting is "The overcommit policy is set via the sysctl `vm.overcommit_memory'", so we can find how it is implemented in the source code:
http://lxr.free-electrons.com/ident?v=4.4;i=sysctl_overcommit_memory, defined at line 112 of mm/mmap.c
112 int sysctl_overcommit_memory __read_mostly = OVERCOMMIT_GUESS; /* heuristic overcommit */
and constant OVERCOMMIT_GUESS
(defined in linux/mman.h) is used actually only in line 170 of mm/mmap.c, this is implementation of the heuristic:
138 /*
139 * Check that a process has enough memory to allocate a new virtual
140 * mapping. 0 means there is enough memory for the allocation to
141 * succeed and -ENOMEM implies there is not.
142 *
143 * We currently support three overcommit policies, which are set via the
144 * vm.overcommit_memory sysctl. See Documentation/vm/overcommit-accounting
145 *
146 * Strict overcommit modes added 2002 Feb 26 by Alan Cox.
147 * Additional code 2002 Jul 20 by Robert Love.
148 *
149 * cap_sys_admin is 1 if the process has admin privileges, 0 otherwise.
150 *
151 * Note this is a helper function intended to be used by LSMs which
152 * wish to use this logic.
153 */
154 int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin)
...
170 if (sysctl_overcommit_memory == OVERCOMMIT_GUESS) {
171 free = global_page_state(NR_FREE_PAGES);
172 free += global_page_state(NR_FILE_PAGES);
173
174 /*
175 * shmem pages shouldn't be counted as free in this
176 * case, they can't be purged, only swapped out, and
177 * that won't affect the overall amount of available
178 * memory in the system.
179 */
180 free -= global_page_state(NR_SHMEM);
181
182 free += get_nr_swap_pages();
183
184 /*
185 * Any slabs which are created with the
186 * SLAB_RECLAIM_ACCOUNT flag claim to have contents
187 * which are reclaimable, under pressure. The dentry
188 * cache and most inode caches should fall into this
189 */
190 free += global_page_state(NR_SLAB_RECLAIMABLE);
191
192 /*
193 * Leave reserved pages. The pages are not for anonymous pages.
194 */
195 if (free <= totalreserve_pages)
196 goto error;
197 else
198 free -= totalreserve_pages;
199
200 /*
201 * Reserve some for root
202 */
203 if (!cap_sys_admin)
204 free -= sysctl_admin_reserve_kbytes >> (PAGE_SHIFT - 10);
205
206 if (free > pages)
207 return 0;
208
209 goto error;
210 }
So, the heuristic is the way to estimate how many physical memory pages are used now (free
), when request for more memory is processed (applications asks for pages
pages).
With always enabled overcommit ("1"), this function always returns 0 ("there is enough memory for this request")
164 /*
165 * Sometimes we want to use more memory than we have
166 */
167 if (sysctl_overcommit_memory == OVERCOMMIT_ALWAYS)
168 return 0;
Without this default heuristic, in mode "2", kernel will try to account the requested pages
pages to get new Committed_AS
(from /proc/meminfo
):
162 vm_acct_memory(pages);
...
this is actually just increment of vm_committed_as
- __percpu_counter_add(&vm_committed_as, pages, vm_committed_as_batch);
212 allowed = vm_commit_limit();
Some magic is here:
401 /*
402 * Committed memory limit enforced when OVERCOMMIT_NEVER policy is used
403 */
404 unsigned long vm_commit_limit(void)
405 {
406 unsigned long allowed;
407
408 if (sysctl_overcommit_kbytes)
409 allowed = sysctl_overcommit_kbytes >> (PAGE_SHIFT - 10);
410 else
411 allowed = ((totalram_pages - hugetlb_total_pages())
412 * sysctl_overcommit_ratio / 100);
413 allowed += total_swap_pages;
414
415 return allowed;
416 }
417
So, allowed
is set either as kilobytes in vm.overcommit_kbytes
sysctl or as vm.overcommit_ratio
as percentage of physical RAM, plus swap sizes.
213 /*
214 * Reserve some for root
215 */
216 if (!cap_sys_admin)
217 allowed -= sysctl_admin_reserve_kbytes >> (PAGE_SHIFT - 10);
Allow some amount memory only for root
(Page_shift is 12 for healthy person, page_shift-10 is just conversion from kbytes to page count).
218
219 /*
220 * Don't let a single process grow so big a user can't recover
221 */
222 if (mm) {
223 reserve = sysctl_user_reserve_kbytes >> (PAGE_SHIFT - 10);
224 allowed -= min_t(long, mm->total_vm / 32, reserve);
225 }
226
227 if (percpu_counter_read_positive(&vm_committed_as) < allowed)
228 return 0;
If after accounting for request, all userspace still has memory amount committed less than allowed, allocate it. In other case, deny the request (and unaccount the request).
229 error:
230 vm_unacct_memory(pages);
231
232 return -ENOMEM;
In other words, as summed in "The Linux kernel. Some remarks on the Linux Kernel", 2003-02-01 by Andries Brouwer, 9. Memory, 9.6 Overcommit and OOM - https://www.win.tue.nl/~aeb/linux/lk/lk-9.html:
Going in the right direction
Since 2.5.30 the values are:
0
(default): as before: guess about how much overcommitment is reasonable,
1
: never refuse any malloc(),
2
: be precise about the overcommit - never commit a virtual address space larger than swap space plus a fraction overcommit_ratio
of the physical memory.
So "2" is precise calculation of memory amount used after the request, and "0" is heuristic estimation.