This Week I Learned - 2017 Week 22

Last week's post or the whole series.

Slow week. Lots of pending stuff needs to be cleared off.

One of the best thing in learning, always be the most stupid person in a class and ask a lot of questions. You never know you will learn something new or relearn something old. Or it may reaffirm some beliefs you always held truly to yourself.

#1 If you need to display CJK characters and emojis in MySQL, use 'utf8mb4_unicode_520_ci'. Historically (another good write-up), MySQL support 'utf8mb3', later changed to 'utfmb4', and changed again to 'utf8mb4_unicode_520_ci' (since MySQL 5.7), and lastly 'utf8mb4_0900_ai_ci' (since MySQL 8.0). Sometimes you wonder why you still stuck with the curse of MySQL where there are better solution out there.

`utf8mb4` is the superset of `utf8`. Meaning that the former also includes the later character set.

To show the supported character set in your MySQL installation.

First check your MySQL version.
mysql> SHOW VARIABLES LIKE "%version%";
+-------------------------+-------------------------+
| Variable_name           | Value                   |
+-------------------------+-------------------------+
| innodb_version          | 5.7.18                  |
| protocol_version        | 10                      |
| slave_type_conversions  |                         |
| tls_version             | TLSv1,TLSv1.1           |
| version                 | 5.7.18-0ubuntu0.17.04.1 |
| version_comment         | (Ubuntu)                |
| version_compile_machine | x86_64                  |
| version_compile_os      | Linux                   |
+-------------------------+-------------------------+
8 rows in set (0.33 sec)

Next list our the available character sets for `utf8mb4`.
mysql> SHOW COLLATION WHERE Charset = 'utf8mb4';
+------------------------+---------+-----+---------+----------+---------+
| Collation              | Charset | Id  | Default | Compiled | Sortlen |
+------------------------+---------+-----+---------+----------+---------+
| utf8mb4_general_ci     | utf8mb4 |  45 | Yes     | Yes      |       1 |
| utf8mb4_bin            | utf8mb4 |  46 |         | Yes      |       1 |
| utf8mb4_unicode_ci     | utf8mb4 | 224 |         | Yes      |       8 |
| utf8mb4_icelandic_ci   | utf8mb4 | 225 |         | Yes      |       8 |
| utf8mb4_latvian_ci     | utf8mb4 | 226 |         | Yes      |       8 |
| utf8mb4_romanian_ci    | utf8mb4 | 227 |         | Yes      |       8 |
| utf8mb4_slovenian_ci   | utf8mb4 | 228 |         | Yes      |       8 |
| utf8mb4_polish_ci      | utf8mb4 | 229 |         | Yes      |       8 |
| utf8mb4_estonian_ci    | utf8mb4 | 230 |         | Yes      |       8 |
| utf8mb4_spanish_ci     | utf8mb4 | 231 |         | Yes      |       8 |
| utf8mb4_swedish_ci     | utf8mb4 | 232 |         | Yes      |       8 |
| utf8mb4_turkish_ci     | utf8mb4 | 233 |         | Yes      |       8 |
| utf8mb4_czech_ci       | utf8mb4 | 234 |         | Yes      |       8 |
| utf8mb4_danish_ci      | utf8mb4 | 235 |         | Yes      |       8 |
| utf8mb4_lithuanian_ci  | utf8mb4 | 236 |         | Yes      |       8 |
| utf8mb4_slovak_ci      | utf8mb4 | 237 |         | Yes      |       8 |
| utf8mb4_spanish2_ci    | utf8mb4 | 238 |         | Yes      |       8 |
| utf8mb4_roman_ci       | utf8mb4 | 239 |         | Yes      |       8 |
| utf8mb4_persian_ci     | utf8mb4 | 240 |         | Yes      |       8 |
| utf8mb4_esperanto_ci   | utf8mb4 | 241 |         | Yes      |       8 |
| utf8mb4_hungarian_ci   | utf8mb4 | 242 |         | Yes      |       8 |
| utf8mb4_sinhala_ci     | utf8mb4 | 243 |         | Yes      |       8 |
| utf8mb4_german2_ci     | utf8mb4 | 244 |         | Yes      |       8 |
| utf8mb4_croatian_ci    | utf8mb4 | 245 |         | Yes      |       8 |
| utf8mb4_unicode_520_ci | utf8mb4 | 246 |         | Yes      |       8 |
| utf8mb4_vietnamese_ci  | utf8mb4 | 247 |         | Yes      |       8 |
+------------------------+---------+-----+---------+----------+---------+
26 rows in set (0.00 sec)

However conversion from `utf8` to `utf8mb4` will encounter these issue of `ERROR 1709 (HY000): Index column size too large. The maximum column size is 767 bytes`. There is a DAMN good guide (ironically better than official documentation) written to resolve that issue and planning for the migration. However, the SQL queries only works for MySQL 5.6.

Meanwhile, this is one good example of writing Perl unit test cases to evaluate `utf8mb4` character set.

#2 Found the solution to my long unanswered question. When writing Bash script, how do we skip certain parameter. For example, we want to skip first parameter but takes the second $2 parameter onward? `shift` my friend. Is a universal concept that I've looked at it almost every day in Perl but never occur to me that this is the same in Bash script as well.

#3 PostgreSQL 10 with examples. The only worthy stuff produced by HP besides the printer. People, embrace the most advanced FOSS database system. Stop playing with MySQL.

#4 Time your console commands but cannot capture the output of `time` to a file? You're not using the command correctly. You should surround it with parenthesis. Either one will work. Depends on how many output files you want to have.
$ (time ls) > outfile 2>&1
$ (time ls) > ls_results 2> time_results

#5 How do you lost 1kg per week or the perfect diet? Both gave the same good advice. It actually make sense without actually starving yourself. The key take here is to first figure out the daily needed calories. Then reduce 500 calories from your food intake and add activities that burn 500 calories as well. The second part make sense. You just don't focus on diet, you should incorporate physical activities as well.

#6 "How do I get started with artificial intelligence?" and "What can I do with AI in my own product or company?" Read this.

#7 The technology stack of craiglist. Surprising, they are using Sphinx Search Server. Thought they might be using Elasticsearch. Why? Based on the comparison on both search servers, Elasticsearch scores way higher than Shpinx. Maybe Sphinx fits the requirement usages in craiglist?

No comments:

Post a Comment