Posts Tagged ‘case sensitivity’

String searching in MySQL

Tuesday, March 25th, 2008

So today I ran across a line that looked like:

Merchant.find(:first, :conditions => [”LOWER(url) = ?”,  …])

Since I’m kind of on an optimization binge, I decided to make sure this was indexed.  No point in sucking down an entire table if we just want to look up one word.

We have an index on merchants.url, but if you know SQL, you already know the answer: no, it doesn’t use the index for this.  I ran EXPLAIN to verify that, and then looked up the general rule: apparently if you use an expression, it doesn’t use an index.

No big deal: just make an index on LOWER(url), right?  Sorry, no, SQL doesn’t allow indexes on functions of columns.  (Actually, it’s a trick question: SQL itself doesn’t have indexes at all.  PostgreSQL can create an index on an expression, but support for this is not at all consistent across databases.)

It turns out the answer is really simple: MySQL is case-insensitive by default.  Removing the LOWER() call returns the same result, but using the index, at least for the things I tried.

But what about non-ASCII characters?  We do go to the trouble to say “DEFAULT CHARSET=utf8″, because we don’t want to exclude, say, people with Japanese names.  Then it gets more complicated.  MySQL has “collators” which seem to be responsible for determining how to case-insensitively compare strings.  By default, it’s … “latin1_swedish_ci”.

So if you’re wondering why you’re typing in a Turkish name and it’s not quite working as you expect, that’s because we’re assuming you’re Swedish.  Nothing personal.