M* Performance

Know your database

by Ivan Chepurnyi

* - Applicable for M2, M1 and MySQL

 

 

 

 

 

 

 

About Me

  • One of lucky former members of Magento 1.x Core Team
  • I love to optimize performance
  • I give trainings and architectural advisories

This presentation may contain some boring numbers

So if you don't like the boring numbers, just ignore them :)

Majority of all bottlenecks in PHP applications are coming from interactions with database

So we need to give some

❤️

MySQL

MySQL is just an application

  • It uses RAM
  • It needs CPU
  • And it has feelings...

So we need to stop executing queries like this on big databases on every customer request...


SELECT `main`.`entity_id`,
      IF(attribute_is_active_scope.value_id IS NOT NULL, attribute_is_active_scope.value, attribute_is_active_default.value) AS `is_active`,  
      `attribute_firstname_default`.`value` AS `firstname`, 
      `attribute_lastname_default`.`value` AS `lastname`, 
      /* ... ommited */
    INNER JOIN `entity_int` AS `attribute_is_active_default` ON attribute_is_active_default.entity_id = main.entity_id AND attribute_is_active_default.attribute_id = '15' AND attribute_is_active_default.scope_id = 0
	LEFT JOIN `entity_int` AS `attribute_is_active_scope` ON attribute_is_active_default.entity_id = attribute_is_active_scope.entity_id AND attribute_is_active_default.attribute_id = attribute_is_active_scope.attribute_id AND attribute_is_active_scope.scope_id = '1'
    INNER JOIN `entity_varchar` AS `attribute_firstname_default` ON attribute_firstname_default.entity_id = main.entity_id AND attribute_firstname_default.attribute_id = '1' AND attribute_firstname_default.scope_id = 0
    INNER JOIN `entity_varchar` AS `attribute_lastname_default` ON attribute_lastname_default.entity_id = main.entity_id AND attribute_lastname_default.attribute_id = '2' AND attribute_lastname_default.scope_id = 0
    INNER JOIN `entity_varchar` AS `attribute_email_default` ON attribute_email_default.entity_id = main.entity_id AND attribute_email_default.attribute_id = '5' AND attribute_email_default.scope_id = 0
    INNER JOIN `entity_varchar` AS `attribute_country_default` ON attribute_country_default.entity_id = main.entity_id AND attribute_country_default.attribute_id = '9' AND attribute_country_default.scope_id = 0
    LEFT JOIN `entity_varchar` AS `attribute_country_scope` ON attribute_country_default.entity_id = attribute_country_scope.entity_id AND attribute_country_default.attribute_id = attribute_country_scope.attribute_id AND attribute_country_scope.scope_id = '1'
    INNER JOIN `entity_datetime` AS `attribute_dob_default` ON attribute_dob_default.entity_id = main.entity_id AND attribute_dob_default.attribute_id = '7' AND attribute_dob_default.scope_id = 0 WHERE (IF(attribute_is_active_scope.value_id IS NOT NULL, attribute_is_active_scope.value, attribute_is_active_default.value) = 1)
          

This queries can eat all the RAM and bring your database server down

MySQL performance tuning should start from optimizing slow queries

Increasing RAM and buffer pool does not solve a problem

Optimization starts from simplifying your queries

Ways to simplify your queries for MySQL

  1. Separate JOINs into separate selects, that are based on main select primary key
  2. Try to minimize JOIN conditions within simplified one
  3. Create materialized index tables for filtration, instead of filters on join
  4. Search for easy override priority for your case

Different Data Retrieval Approaches

Why Separate is better?

  1. It doesn't create a temporary table
  2. It uses index instead of a table scan
  3. It works with only small subset of data, instead of joining two million records tables

How to make a separate query

Select identifiers


  SELECT entity_id FROM entity_flat 
      WHERE is_active = 1 and store_id = 1 
      LIMIT 0, 100

Select data


  SELECT entity_id, attribute_id, scope_id, value
      FROM entity_varchar 
      WHERE entity_id IN(1, 2, 3 ....) 
UNION ALL
SELECT entity_id, attribute_id, scope_id, value
      FROM entity_decimal 
      WHERE entity_id IN(1, 2, 3 ....)
ORDER BY scope_id
      

How about indexation?

MySQL Memory

MySQL memory consuption is increasing with amount of records you need to update.

These are known MySQL serial killers:

INSERT INTO .. SELECT ... FROM
UPDATE ... JOIN ... SET

Ranged Queries to the Resque

  1. You just need to find min / max pair and total number of primary key matches
  2. Find your perfect batch size (~ 20k)
  3. Execute same query by moving forward with batch size

Single vs Ranged Approaches

How to make a ranged query

Select boundaries


  SELECT MIN(entity_id), MAX(entity_id)  FROM entity

Create ranged inserts


  INSERT INTO `entity_flat`
    (`entity_id`, `scope_id`) 
    SELECT `main`.`entity_id`, '4' AS `scope_id` 
        FROM `entity` AS `main` 
        WHERE (`main`.`entity_id` >= 1) AND (`main`.`entity_id` < 20001)
      

Flat Index Data Purpose

  • Filter Data
    Fast selection with indexed where condition
  • Sort Data
    Fast ordering with indexed field order

Flat Data's main purpose is not a retrieval

Small Flat vs Full Flat Generation

Pager problem

As more data you have, as slower it is to navigate to the last page

Problem in this query


SELECT * FROM `entity_flat_data` 
    WHERE (is_active = 1) AND (scope_id = '5') 
    ORDER BY `firstname` ASC LIMIT 50 OFFSET 450121
   			

Solution

Use a separate query for limit

Query Paging Approaches

How to make efficient pager query

Select ids at particular page


  SELECT `entity_id` FROM `entity_flat`
      WHERE (scope_id = '2') AND (is_active = 1) 
      ORDER BY `firstname` ASC LIMIT 100 OFFSET 8881

Select the data


  SELECT `entity_flat_data`.* FROM `entity_flat_data` 
     WHERE (scope_id = '2') 
     	AND (entity_id IN(3389, 7032, 2223, 724, 5076 ... ))
      

Exporting big data-sets?

Data Export Approach

How to make efficient data export query

  1. For huge result-set, create a memory table with primary key values from main select
  2. Join that memory table into data select
  3. Drop all created memory tables after export is done

Benchmark code

You can download benchmark base from this repository: https://github.com/EcomDev/mysql-performance-benchmark

Take-aways

  • Avoid joins if possible
  • Separate queries over single one
  • Create indexers
  • Use memory table for large datasets export

Thank You

Q&A