Find Top-Level Domain (TLD) With MySQL

I was stuck with a problem of extracting Top-Level Domain (TLD) from a column in MySQL database. TLD is the last part of the domain name, example is the .com in google.com. As TLD varies in size, you may have you have .io (popular with startup), .com, or .name, how to you extract this part out without using any code but only SQL? Why SQL? Because it’s easier to group domain by TLD using GROUP BY clause.

Luckily, MySQL’s SUBSTRING_INDEX function can solve this problem easily. As the function name implies, extract a subset of a string into an index. First, let’s look at the function definition and parameters.

SUBSTRING_INDEX(str, delim, count)
  • str, the string we want to perform the action
  • delim, the delimiter we want to break the string into an index
  • count, maximum number of item from left (if positive), or from right (if negative) return

Examples:
mysql > select substring_index('111.222.333.444.555', '.', '1');
-> 111

mysql > select substring_index('111.222.333.444.555', '.', '2');
-> 111.222

Note : Negative count return result from the right
mysql > select substring_index('111.222.333.444.555', '.', '-1');
-> 555

mysql > select substring_index('111.222.333.444.555', '.', '-1');
-> 444.555

Back to my original question, how do we extract different TLD from a domain name?
mysql > select substring_index('www.google.io', '.', '-1');
-> io

mysql > select substring_index('www.google.com', '.', '-1');
-> com

mysql > select substring_index('www.google.name', '.', '-1');
-> name

Unfortunately, this only work with first level TLD and not second or later level like www.yahoo.co.jp where the TLD is co.jp.

No comments:

Post a Comment