Our team is working is developing WordPress plugins and provides hosted instances on a couple of independent servers. Our WordPress installation is managed by Git, all servers have the same source & WordPress setup deployed, only domains & actual data in the database varies. For each installation, MySql is running on the same host. WordPress is running exclusively on each server.
However after having deployed this setup on a Windows Server 2008 RC2, we noticed a drastic performance difference compared to our other servers: page generation time goes up from avg. 400ms to 4000-5000ms for pages generated with PHP. For static resources delivered by Apache only, speed is about the same as on linux.
So we took some steps to narrow down the problem:
- Make sure there is no antivir-software running or other windows domain stuff interfering
- Collect profiling data to identify the timekillers during script execution
- Test different server & hardware setups
- Double-check both Apache and PHP configuration for obvious configuration errors
After some profiling we quickly noticed that the evaluation of regular expressions is horribly slow on our windows machines. Evaluating 10.000 Regular expressions (preg_match
) takes about 90ms on Linux and 3000ms on Windows.
Profiling, system tests and configuration details are provided bellow. We don't want to optimize this script (which we do know how to do). We want to get the script to run approximately the same speed on windows as on Linux (given the same setup regarding opcache/...). No need to optimize the memory footprint of the script too.
Update: After some time, the systems seems to run out of memory, triggering out of memory exceptions and random allocations. See bellow for more details. Restarting Apache/PHP fixed the problem for now.
Trace to _get_browser
is:
File (called from)
require wp-blog-header.php (index.php:17)
wp (wp-blog-header.php:14)
WP->main (functions.php:808)
php::do_action_ref_array (class-wp.php:616)
php::call_user_func_array (wp-includes/plugin:507)
wp_slimstat::slimtrack (php::internal (507))
wp_slimstat::_get_browser (wp-slimstat.php:385)
Update 2: Some some reason I can't remember we went back to activating PHP as an Apache Module on our servers (the same which deliver bad performance). But today they run blazingly fast (~1sec/request). Adding Opcache brings this down to ~400ms/req. Apache/PHP/Windows remained the same.
1) Profiling Results
Profiling was done with XDebug on all machines. Usually we only collected a few runs - those were enough to reveal the location where most of the time (50%+) was spent: the method [get_browser][1]
of the WordPress plugin wp-slimstats
:
protected static function _get_browser(){
// Load cache
@include_once(plugin_dir_path( __FILE__ ).'databases/browscap.php');
// browscap.php contains $slimstat_patterns and $slimstat_browsers
$browser = array('browser' => 'Default Browser', 'version' => '1', 'platform' => 'unknown', 'css_version' => 1, 'type' => 1);
if (empty($slimstat_patterns) || !is_array($slimstat_patterns)) return $browser;
$user_agent = isset($_SERVER['HTTP_USER_AGENT'])?$_SERVER['HTTP_USER_AGENT']:'';
$search = array();
foreach ($slimstat_patterns as $key => $pattern){
if (preg_match($pattern . 'i', $user_agent)){
$search = $value = $search + $slimstat_browsers[$key];
while (array_key_exists(3, $value) && $value[3]) {
$value = $slimstat_browsers[$value[3]];
$search += $value;
}
break;
}
}
// Lots of other lines to relevant to the profiling results
}
This function similar to PHP's get_browser
detects the browser's capabilities and OS. Most of the script execution time is spent in this foreach
loop, evaluating all those preg_match
(~approx 8000 - 10000 per page request). This takes about 90ms on Linux and 3000ms on Windows. Results were the same on all setups tested (picture shows data of two executions):
Sure, loading two huge arrays takes some time. Evaluating regular expressions too. But we'd expect them to take approximately the same time on Linux and Windows. This is the profiling result on a linux vm (one page request only). The difference is pretty obvious:
Another time killer was actually the Object-Cache WordPress uses:
function get( $key, $group = 'default', $force = false, &$found = null ) {
if ( empty( $group ) )
$group = 'default';
if ( $this->multisite && ! isset( $this->global_groups[ $group ] ) )
$key = $this->blog_prefix . $key;
if ( $this->_exists( $key, $group ) ) {
$found = true;
$this->cache_hits += 1;
if ( is_object($this->cache[$group][$key]) )
return clone $this->cache[$group][$key];
else
return $this->cache[$group][$key];
}
$found = false;
$this->cache_misses += 1;
return false;
}
Time is spent within this function itself (3 script executions):
On linux:
The last real big time killer were translations. Each translation, loaded from memory, takes anything from 0.2ms to 4ms in WordPress:
On linux:
2) Tested systems
In order to make sure virtualization or Apache do affect this, we tested this on several setups. Antivir was disabled on all setups:
- Linux Debian, Apache 2 & PHP on up to date stable releases. This is the same for developers running in their virtual machines as for staging/live servers. Acting as a reference system of desired performance. Either run in our office or at some hosting provides (shared space). Windows Systems had between 4GB and 8GB of RAM, at all time memory usage was bellow 50%. Virtualizations never run Windows & Apache at the same time.
- Life-Servers, running at T-Systems (managed virtualized servers), on VMWare Player
- Win 2008 R2. Apache 2.2.25 + PHP 5.4.26 NTS,VC9 as fastcgi module
- Win 2008 R2. Apache 2.2.25 + PHP 5.5.1 NTS,VC11 as fastcgi module
- Win 2008 R2. Apache 2.2.25 + PHP 5.5.1 NTS,VC11 as apache module
- Win 2008 R2, Apache 2.2.25 + PHP 5.5.11 TS,VC11 as apache module (that's the fast one I mentioned in the update 2)
- On a local machine, Host: OpenSuse, Virtualization: VMWare player, same as @T-Systems. To avoid their infrastructure influencing us:
- Win 2008 R2. Apache 2.2.25 + PHP 5.4.26 NTS,VC9 as fastcgi module
- Win 2008 R2. IIS7 + PHP 5.4.26 NTS,VC9 as fastcgi module (with and without wincache)
- Win 2012. IIS * + PHP 5.5.10 NTS,VC11 as fastcgi module (with and without wincache)
- On a local machine without virtualization
- Win 2008 R2. Apache 2.2.25 + PHP 5.4.26 NTS,VC9 as fastcgi module
Profiling results as mentioned above were the same on the different systems (~10% derivation). Windows was always a significant factor slower then Linux.
Using a fresh install of WordPress & Slimstats resulted in approx. the same results. Rewriting the code is not an option here.
Update: Meanwhile we found two other Windows Systems (both Windows 2008 R2, VM & Phys) where this complete stack runs quite fast. Same configuration though.
Update 2: Running PHP as apache module on the Life-Servers was slightly faster then the fastcgi method: down to ~2sec, 50% less.
Running out of Memory
After some time, our Live-Server stops working at all, triggering these out of memory exceptions:
PHP Fatal error: Out of memory (allocated 4456448) (tried to allocate 136 bytes)
PHP Fatal error: Out of memory (allocated 8650752) (tried to allocate 45 bytes)
PHP Fatal error: Out of memory (allocated 6815744) (tried to allocate 24 bytes)
This happens at random script locations. Obviously the Zend Memory Manager is not able to allocate more memory, although the scripts would be allowed to do so. At the time if incident, the server had about 50% of free RAM (2GB+). So the server does not actually run out of ram. Restarting Apache/PHP fixed this problem for now.
Not sure if this problem is related to the performance issues here. Yet as both issues seem to be memory related, its included here. Especially we'll try to reproduce the settings of the Windows-Tests that provided decent performance.
3) Apache & PHP Configuration
... probably do not have any common pitfalls. Output-Buffering is enabled (to default), multibye override disabled, ... If any option(s) are of interest we'll happily provide them.
Output of httpd.exe -V
Server version: Apache/2.4.7 (Win32)
Apache Lounge VC10 Server built: Nov 26 2013 15:46:56
Server's Module Magic Number: 20120211:27
Server loaded: APR 1.5.0, APR-UTIL 1.5.3
Compiled using: APR 1.5.0, APR-UTIL 1.5.3
Architecture: 32-bit
Server MPM: WinNT
threaded: yes (fixed thread count)
forked: no
Server compiled with....
-D APR_HAS_SENDFILE
-D APR_HAS_MMAP
-D APR_HAVE_IPV6 (IPv4-mapped addresses disabled)
-D APR_HAS_OTHER_CHILD
-D AP_HAVE_RELIABLE_PIPED_LOGS
-D DYNAMIC_MODULE_LIMIT=256
-D HTTPD_ROOT="/apache"
-D SUEXEC_BIN="/apache/bin/suexec"
-D DEFAULT_PIDLOG="logs/httpd.pid"
-D DEFAULT_SCOREBOARD="logs/apache_runtime_status"
-D DEFAULT_ERRORLOG="logs/error.log"
-D AP_TYPES_CONFIG_FILE="conf/mime.types"
-D SERVER_CONFIG_FILE="conf/httpd.conf"
mpm_winnt_module
configuration:
<IfModule mpm_winnt_module>
ThreadsPerChild 150
ThreadStackSize 8388608
MaxConnectionsPerChild 0
</IfModule>
Excerpt of php.ini:
realpath_cache_size = 12M
pcre.recursion_limit = 100000
4) Current suspected reason
Old assumption:
All three examples heavily rely on big arrays and string operations. That some kind seems to be the common factory. As the implementation works ok'ish on Linux, we suspect this to be a memory problem on Windows. Given there is no database interaction at the pin-pointed locations, we don't suspect the database or Server <-> PHP integration to be the problem. Somehow PHP's memory interaction just seems to be slow. Maybe there is someone interfering with the memory on Windows making access dramatically slower?
Old assumption 2:
As the same stack runs fine on other Windows machines we assume the problem to be somewhere in the Windows configuration.
New assumption 3:
<block