We manage an expired and expiring domains search engine called DomCop which runs on a 2 GB Linode instance on the LEMP stack. We process 250,000 domains every day and there are about 20+ cron scripts that run continuously importing domains into a table called err…‘domains’ and updating the records with various metrics.
Speed is paramount for a search engine and hence making a user wait for a query to complete leads to a very poor user experience. In the first few months of launching DomCop we faced many scaling issues as our user base grew. Here is what happened…
Smooth beginnings with MyIsam Table Type
We had worked with MyIsam for many projects previously and no matter how large the table was MyIsam worked perfectly. However, what we did not know is that anytime a MyIsam table row gets updated, the entire table gets locked. This is because MyIsam does not support row level locking. The 20+ cron scripts were running queries to update the same table causing the whole system to run at snail’s pace.
Moving to the Pro: Innodb
Innodb is the logical choice for a write heavy system as it supports row level locking. The domain search was not lightning fast, but it wasn’t too bad. However as we added more domain sources, the table size grew to a large 7GB (8 million records). The searches for this table became increasingly slow to a point where some would take up to a few minutes to complete, which was totally unacceptable to us.
Alternatives that didn’t work…
We thought about using some sort of cache like Memcached and Redis to make sure the customer queries would run faster. However, the user can search on more than 30 different columns and a key value store would not give us this flexibility.
Where have you been all my life, Memory Table?
In our research we chanced upon the MySQL Memory Table type. There are a few important points to note about Memory Tables.
- Table data is stored completely in memory.
- When MySQL restarts, all data is completely erased.
- They work really well if you have a read heavy system, since they do not have row level locking
What we did was create a new memory table with the Memory Table type. Every few hours we ran a script that would automatically copy data from the domains table to the memory table. Now the background processes could continue to work on the Innodb domains table while the users could easily read from the memory table. Even our really complex domain search queries (using REGEXP) were now completed in 0.3 seconds!
Memory Table Optimization Tips
- Varchars take up the space for all the chars defined. Therefore a VARCHAR(100) would take up four times the space of a VARCHAR(25). While dealing with MyIsam and Innodb tables this is not an issue, it is important to reduce the size of VARCHARS as much as possible to save memory. Same goes with using BigInt and Int where a Tiny Int would do.
- Since the domains table had data for about 30 days and the users were interested only in the data available over the following 7 days, we only store data worth 7 days in the memory table. This reduced table size to a mere 300MB giving us more free RAM.
Despite the fact that some of us here have been working with MySQL for over 10 years we did not see the easy solution staring us in the face. We could almost hear our database mock us in Ygritte’s voice “You know nothing, Jon Snow”