This is the list of facts I collected. Instead of GC the term memory (de)allocation seems to be more appropriate in this context.
My principal information source is the blog of Loren (especially its comments) and this article from MATLAB Digest.
Because of its orientation for numeric computing with possible large data sets, MATLAB does really good job on optimizing stack objects performance like using in-place operations on data and call-by-reference on function arguments. Also because of its orientation its memory model is fundamentally different from such OO languages as Java.
MATLAB had officially no user-defined heap memory until version 7 (in version 6 there was undocumented reference
functionality in schema.m
files). MATLAB 7 has heap both in form of nested functions (closures) and handle objects, their implementation share the same underpinnings. As a side note OO could be emulated with closures in MATLAB (interesting for pre-2008a).
Surprisingly it is possible to examine entire workspace of the enclosing function captured by function handle (closure), see function functions(fhandle) in MATLAB Help. It means that enclosing workspace is being frozen in memory. This is why cellfun/arrayfun
are sometimes very slow when used inside nested functions.
There are also interesting posts by Loren and Brad Phelan on object cleanup.
The most interesting fact about heap deallocation in MATLAB is, in my opinion, that MATLAB tries to do it each time the stack is being deallocated, i.e. on leaving every function. This has advantages but is also a huge CPU penalty if heap deallocation is slow. And it is actually very slow in MATLAB in some scenarios!
The performance problems of MATLAB memory deallocation that can hit code are pretty bad. I always notice that I unintentionally introduce a cyclic references in my code when it suddenly runs x20 slower and sometimes needs some seconds between leaving function and returning to its caller (time spent on cleanup). It is a known problem, see Dave Foti and this older forum post which code is used to make this picture visualizing performance (tests are made on different machines, so absolute timing comparison of different MATLAB versions is meaningless):
Linear increase of pool size for reference-objects means polynomial (or exponential) decrease of MATLAB performance! For value-objects the performance is, as expected, linear.
Considering these facts I can only speculate that MATLAB uses not yet very efficient form of reference counting for heap deallocation.
EDIT: I always encountered performance problem with many small nested functions but recently I noticed that at least with 2006a the cleanup of a single nested scope with some megabytes of data is also terrible, it takes 1.5 seconds just to set nested scope variable to empty!
EDIT 2: finally I got the answer - by Dave Foti himself. He acknowledges the flaws but says that MATLAB is going to retain its present deterministic cleanup approach.
Legend: Shorter execution time is better