XtraBackup causing swap despite numa_interleave on mysqld

Jeremy Cole has an excellent article on why a mysqld instance will swap despite proper memory settings on modern hardware here. A quick hand-wave of a summary goes: in modern architecture, every process is, by default, limited to a 1/Nth of the available memory on a machine, where N is the number of cores, and will start to swap on disk when all the available memory in that 1/Nth is consumed.

 

The eventual solution for a large, single-instance mysqld-running machine is to enable numa_interleave in the mysqld_safe section of the config:

Screen Shot 2014-11-18 at 10.09.03 AM

Implicitly, when that flag is set, mysqld is being started by mysqld_safe with a numactl flag that will spread the memory allocation evenly across all nodes (“numactl –interleave=all [… rest of mysqld start command]”).

 

There is a subtlety, though, that must be considered: mysqld was just started up and allocated most of the memory on each of the nodes. When all nodes on a machine are mostly full, no locally running script, such as XtraBackup, can consume significant memory without starting to cause swap since that script, by default, will only consume memory in its hosting node. Also, that swap, generally, will be the mysqld process.

 

The solution, then, becomes to interleave all processes that lead to swap or any process that can reasonably be expected to consume a significant amount of memory before it leads to swap.

 

Before, Xtrabackup could look like this:

innobackupex –defaults-extra-file=/path/to/auth –tmpdir=/backup/partition –sla[…]

Now, Xtrabackup will look like this:

numactl –interleave=all innobackupex –defaults-extra-file=/path/to/auth –tmpdi[…]

 

I hope this helps resolve any additional swap issues even after mysqld’s numa_interleave flag was enabled. Let me know if there are clarifications I can make.

 

Update Aug 2015: I’ve learned about a few more variables related to numa management. Will provide corrections soon.

Preventing Max Connections Errors with InnoDB

Stop increasing max_connections every time there’s a 1040: Too Many Connections error. Every additional connection is another share to further divide the available memory.

Instead, while it would be best to manage the workload, it is also reasonable to properly utilize the available hardware with good server configuration.

There are three relevant server configuration options for managing connection counts as they relate to satisfying web requests.

  1. max_connections – the queue depth
  2. innodb_thread_concurrency – the count of queue consumers
  3. innodb_concurrency_tickets – the amount of work a consumer can do on a query before switching to the next query request

Correctly configuring these three variables, and controlling your workload of course, can prevent 1040 Too many connections errors, assuming, of course, that your server is not beyond its configurable capacity.

Taken together, these variables have a number of interesting behaviors.

The first typical issue, resource starvation due to large reporting queries, can be mitigated with proper innodb_concurrency_tickets when the hardware is too weak to increase the query consumer count (innodb_thread_concurrency). When innodb_concurrency_tickets equals 500, which is the default, then five hundred InnoDB rows may be touched before the InnoDB thread moves onto the next query in the queue. A ticket is consumed on any row touch, without regard to whether or not it’s a row-read, a uniqueness check, or a foreign-key lookup. Accordingly, setting innodb_concurrency_tickets to 1 means the InnoDB consumer threads will spend a large portion of their cycles context switching between queries, whereas setting innodb_concurrency_tickets to max will starve other potentially much cheaper queries of resources. Personally, the default value of innodb_concurrency_tickets works just fine for my workload and I’ve never moved it from the default value of 500. A few cpu resources will be spent context switching, but the fairness will be worthwhile for most web apps.

The second typical issue, stampedes of cheap queries causing 1040 errors, can be mitigated with an appropriately sized max_connection value when the hardware is too weak to increase the query consumer count. It’s not really an ideal solution, though. Every additional max_connection value means there’s a risk of another query concurrently doing joins, temporary tables, and the piles of complicated, memory-consuming behavior that leads to swap, and even potentially the OOM-Killer. Managing workload with good caching, potentially even proactive caching to manage the Thundering Herd problem, and throttled asynchronous application behavior is a positive solution that scales well.

The combination of these issues, as well as modern, multi-core hardware with powerful IO backends, pulls in the third system configuration option, innodb_thread_concurrency. The count of ‘query consumers,’ innodb_thread_concurrency will allow InnoDB to work on separate queries at the same time. By default, innodb_thread_concurrency is set to 8. Modern OS’s are just fine at scheduling, so I like to keep this value sufficiently high that I have at least twice as many consumers available as the number of possible concurrent, expensive queries. Having a large count of query consumers available keeps the cheap queries processed and back out of the queue, allowing the connection count to stay low.

So, what’s wrong with just opening up the max_connection and/or the innodb_thread_concurrency values to max? It doesn’t matter how powerful your hardware is, too many queries all simultaneously consuming an equal share of resources will prevent any of them from completing. Queries not completing and closing their connections will, of course, lead to a backlog of the cheap queries, and potentially a whole new batch of the cheap queries if the web requests retry the unreturned cheap queries. Eventually, the server goes unresponsive, customers get angry, the business crashes, and you’re looking for a new job.

Hopefully it’s clear: the right number of query consumers (innodb_thread_concurrency), the right amount of work per focus (innodb_concurrency_tickets), and the right amount of time for the consumers to close incoming query requests (max_connections), and you’ll be able to avoid 1040 Too many connections exceptions, business closings, and job hunts.

Prepared Statements

Any introduction to Prepared Statements has to start with SQL Statements.

 

Every SQL Statement arriving at a database is processed before executing and returning a resultset. The processing is a series of computations including: parsing, compiling, and query optimization.

The parsing, compiling, and query optimization can lead to a lot of wasted CPU if your queries are very similar, like this example from the widely available test database Sakila:

Example Queries:

  SELECT * FROM address WHERE address_id = 12;

  SELECT * FROM address WHERE address_id = 34;

  SELECT * FROM address WHERE address_id = 56;

Example Code, assuming PDO & PHP, to produce those queries:

  $ids = array(12,34,56);

  $sql = “SELECT * FROM address WHERE address_id = “;

  $databaseResourceHandle = CustomDB::getDBH();

  $rows = array();

  foreach ($ids as $id) {

   $result = $databaseResourceHandle->query($sql . $id);

   $rows = array_merge($rows, $result->fetchAll());

  }

 

Prepared Statements, on the other hand, are an advanced way to process SQL Statements with as little overhead as possible on the side of the database, swapping CPU cycles for a little RAM.

 

When comparing Prepared Statements and SQL Statements, it is important to consider both the benefits and drawbacks.

  • Benefit:
    Parameterized variables within the query are immune to SQL injection

  • Benefit:
    Processing overhead only occurs at Prepared Statement Handle instantiation
    This prevents parsing, compiling, and query optimization on future executions.

  • Benefit:
    PDO can avoid all possible statement preparation failures related to a full instance-wide Prepared Statement Handle pool

  • Drawback:
    Prepared Statement Handles consume handles from an instance-wide pool that can lead to failure when all handles in that pool are allocated

  • Drawback:
    From the MySQL docs: ‘Prepared Statement allocation is specific to a session until the end of a session or deallocation.’ Do not allocate prepared statements in stored procedures without risking memory leaks. Also, do not use stored procedures, but that topic is outside the scope of this discussion.

  • Drawback:
    Prepared Statement Handles consume memory on the mysql server. There is no visibility into the amount of memory they consume at any level.

 

Prepared Statements are a way to avoid paying the overhead of standard SQL Statements more than once.  The actual code implementation is very different, and for good reason. You’re supposed to pass the query in with parameterization for whatever part is going to change in future executions of the SQL Statement:

Example Queries:

  SELECT * FROM address WHERE address_id = 12;

  SELECT * FROM address WHERE address_id = 34;

  SELECT * FROM address WHERE address_id = 56;

Example Code, assuming PDO & PHP, to produce those queries using Prepared Statements:

  $ids = array(12,34,56);

  $sql = “SELECT * FROM address WHERE address_id = :address_id”;

  $databaseHandle = CustomDB::getDBH();

  $rows = array();

  $preparedStatementHandle = $databaseHandle->prepare($sql);

  foreach ($ids as $id) {

    $preparedStatementHandle->execute(array(‘address_id’ => $id));

    $result = $preparedStatementHandle->fetchAll();

    $rows = array_merge($rows, $result);

  }

 

 

The above code is the simplest way to set up efficient Prepared Statement handling, but it’s still mostly useless. The problem here is that my example can easily be rewritten into an IN clause:

  SELECT id FROM address WHERE address_id IN (12,34,56);

This rewritten SQL Statement would be one network trip and a single processing for all answers.

 

So when is a Prepared Statement useful?

Prepared Statements are useful when they are related to session-long caching alongside a database wrapper that’s just good enough to programmatically generate most typical lookups.

Unfortunately, “good enough” caching looks complicated:

Example Queries:

  SELECT * FROM address WHERE address_id = 12;

  SELECT * FROM address WHERE address_id = 34;

  SELECT * FROM address WHERE address_id = 56;

Example Code, assuming PDO & PHP, to produce those queries using Prepared Statements with caching:

  $ids = array(12,34,56);

  $sql = “SELECT * FROM address WHERE address_id = :address_id”;

  $databaseHandle = CustomDB::getDBH();

  Cache::setStrategy(Cache_LRU::getLabel());

  $rows = array();

  if (!Zend_Registry::isRegistered(self::PREP_STMT_CACHE_KEY)) {

    Zend_Registry::set(self::PREP_STMT_CACHE_KEY, Cache::getInstance());

  }

  $preparedStatementHandle = Zend_Registry::get(self::PREP_STMT_CACHE_KEY)->get($sql);

  if (!$preparedStatementHandle) {

    $preparedStatementHandle = $databaseHandle->prepare($sql);

    // Use the sql itself as the index/hash

    Zend_Registry::get(self::PREP_STMT_CACHE_KEY)->set($sql, $preparedStatementHandle);

}

  foreach ($ids as $id) {

    $preparedStatementHandle->execute(array(‘address_id’ => $id));

    $result = $preparedStatementHandle->fetchAll();

    $rows = array_merge($rows, $result);

  }

 

How can Prepared Statements fail?

  • Prepared Statements only exist for the current session, so holding onto a handle after a session closes will lead to failures
  • Each Prepared Statement consumes a handle from the instance’s Prepared Statement pool, which is the “max_prepared_stmt_count”
  • Out Of Memory on the client side

 

Case 1:

A connection closes while a cache of Prepared Statement Handles exists

Solution:

Update your PDO wrapping class to have a __destruct method defined to clear the relevant cache of Prepared Statement Handles before calling the parent’s destruct method.

 

Case 2:

The max_prepared_stmt_count value is reached on a database

Solution:

Immediately drop all local caches of Prepared Statements and try again. If there is still an issue, activate PDO’s ATTR_EMULATE_PREPARES flag to silently convert calls of ->prepare and ->exec into standard SQL Statements.

Note:

It appears that automatic escaping is retained in this circumstance.

 

Case 3:

Out Of Memory (OOM) on the client-side.

Solution:

Reduce the Prepared Statement Handle Cache size. The cache does not have to be large if it is well managed. Even my company’s complicated webapp’s web requests do not fill a 200-statement-long FIFO cache.

 

 

Remaining Points:

  1. Monitor (ie: Nagios) Prepared_stmt_count vs max_prepared_stmt_count

  2. Monitor (ie: StatsD) the Prepared Statement Handle Cache hit, miss, and purge rate.

  3. An LRU’s extra minimal overhead is only worthwhile over a simple FIFO if your Prepared Statement Handle Cache is too small for all the queries that should be cached.
    Note: Your cache should be small due to the unbounded and invisible memory consumption of Prepared Statement Handles on the database server

  4. A best-case Prepared Statement instantiation against localhost with a simple select costs me, on average, about 300 microseconds. Pulling a handle from a cache is about 6 microseconds.

  5. Coworkers have shown me that long lists of Named Parameters (ie: “:id1, :id2, :id3, […]”) get more expensive with quantity whereas long lists of Ordered Parameters (ie: “?,?,?,[…]”) remain cheap even in large number. Numerically quantifying this slowdown will be a future post.

  6. Ordered Parameters’ values are not decoded in SHOW PROCESSLIST. Named Parameters’ values are displayed, however, which makes them, to me, far preferable. 

Finally, note that the last implementation still is not ideal: Prepared Statement Handles should be contained within a connection object for proper management (e.g.: destruct on disconnect).

Keep It Simple

This post is about keeping it simple when it comes to managing a company’s databases. Simplicity lets me easily diagnose problems, quickly setup new databases, readily validate my actions while configuring replication, and identify the right course of action even at two in the morning.

Simplicity in my infrastructure allows me to diagnose problems. It is difficult to diagnose if a past network spike was from a Memcached instance or a Postfix instance hosted on the same machine. Service segregation, meaning leaving a single host responsible for a single heavy process, significantly aids in diagnoses because standard, rapidly deployed graphs in any of a number of RRDTool implementations can show host-wide performance statistics that can only have come from the single heavy process that machine is responsible for running.

Simplicity in my instance configuration has made it easy to add another instance. The current time from request to implementation is about ten minutes plus a pair of code reviews for puppet. Every configuration file is identical save for the last two lines, which are host specific and related replication.

Standardizing replication log file names such that they are based on the hostname provides two excellent benefits. The first benefit is that I am able to rapidly validate my connection within the same command line prompt where I am making replication changes. Switching between windows and tabs to validate that I’m connected to the correct machine produces an unnecessary risk that I will switch back to the wrong machine. Having the logfile related to the host I am working on allows for easy validation. The second benefit is that I am able to provide programmatic protection against human error. Even when connected to the wrong machine, having a host-specific file name means the replication session will try to read files that do not exist, ensuring that improper replication events will not be played on the wrong host.

Simplicity in my application configuration allows me to do a fail over in five minutes and shove cleanup tasks into the surrounding half hour. I’d be happier with a faster fail over and cleanup, but my team finds the current system acceptable because we (knock on wood) haven’t had cause to perform an immediate, emergency fail over in well over a year. At the time of this writing, we maintain a list of IPs in a file inside the application code base. The list of ips is always available wherever the code exists, does not require a database connection, and is easily changed due to the complete lack of any need to cache the data. It’s a simple solution when compared to keeping shard information in a database and writing a complex caching layer for that IP data.

Keeping it simple is a goal I’ve strived for in my production environments. Failovers are an IP change, replication settings are as simple as I can safely make them, setup is a new file in puppet, and database hosts follow Service Segregation by only running MySQL/Percona Server. It keeps my nights pretty free of Nagios alerts. :)