Stuck with Consevative GNU/Linux Distros

I stuck with conservative GNU/Linux distros (Centos or Debian stable) which don't let you update to a more current LAMP-stack packages. No, don't want to compile source code. What should I do?
For Debian stable, use Dotdeb. For Centos, use IUSCommunity. IUS stands for Inline with Upstream Stable. This third party RPM repository is "sponsored by internal work at Rackspace (but officially unsupported)"

What if I stuck with hosting panel like cPanel, Plesk, or Direct Admin?
Pray hard. Pray very hard. You're at the mercy of your hosting provider.

Hiring New Developers

"Once we've received the code, we review it for the above stated qualities with an emphasis on clean, readable and well-tested code. We're not too hard on these code reviews, we just want to try to eliminate obviously poor fits. People who write no tests are immediately eliminated. Hugely over-architected solution? Eliminated. Code is bad enough the person might not actual be able to program? You get the idea."
-- prophetjohn, emphasis mine
So their procedures of hiring a new developers are:

1. Send a sample coding project.
2. Check for tests, not over-engineered code, and the ability to code.
3. Coding interview with real project (something the company is current working on). Pair-up code review and refactoring with existing developers.
4. Informal group interview by the developer team.

Invalid UTF-8 sequence in Subversion

Encountered this when I tried to update the local working copy. Error message obtained as follow:
$ svn up

svn: Valid UTF-8 data
(hex: 49 50 )
followed by invalid UTF-8 sequence
(hex: a0 2d 56 69)

No idea which file(s) are causing this. Google around for quick solution. Found that you will need to go into every folder and type svn info to trace down the culprit file. Not a good solution.

Second approach. Try to limit down the scope. Using svn log -v to trace back recently committed files.

Google around for answer. Suspect some of the committed files was encoded in different charset. Checked using file -i command as shown. Not helpful at all.
$ file -bi config.php

text/x-c++; charset=us-ascii

Google again. Interesting solution proposed by cooper,
"strace svn status will give you the name of the offending file. unfortunately, svn care about name of files that are in one of its directories, even if it’s not under revision."
Try again using strace, an utility to monitor system calls [4]. Whola, offending file highlighted in bold.
$ strace svn status

*open("pdf", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 3*
fstat(3, {st_mode=S_IFDIR|0777, st_size=278528, ...}) = 0
fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
getdents(3, /* 38 entries */, 4096)     = 4088
write(2, "svn: Valid UTF-8 data\n(hex: 49 5"..., 122svn: Valid UTF-8 data
(hex: 49 50)
followed by invalid UTF-8 sequence
(hex: a0 2d 56 69)
) = 122
close(3) 

It seems the offending files were located in the pdf folder. Remove that folder and try svn status again.

Whoa ! Problem solved. The root of the problem was related to certain PDF files generated with unsupported enconding file names.

Seriously, I fricking love strace!

Better Approach for Hiring Programmer

"The only significant difference between the two hiring pushes was the approach--the first was focused entirely on a candidate who had n years of experience on platform X; the second focused on candidates who showed the greatest talent and aptitude in previous professional and personal projects, and gave the greatest indications they were a great fit for the personality of the team."
......
"The most important lesson I learned when I worked with a formal HR department to hire new people is that they care. They really do. They don't want to hire terrible candidates who are going to be miserable, hate their jobs, and drag down a team. They want to empower people to improve professionally while taking care of a company's needs."
-- bobwaycott, emphasis mine
In other words, forget about hiring people with n years experiences but instead hire people who care about their work and eager to learn?

Finally, 10,000 Steps!

Precisely, I have walked 12,146 steps today. That translates to 9.4KM or 535kcal or 34.2g. Took me several months to reach this goal. Is nice to look at that cheering symbol \o/ displayed on my pedometer. First 6,000 steps started during my midnight walk (bad idea, I was down with flu). The subsequence 6,000 steps was from doing housework and my walk to take my dinner. A big improvement over my average 5,000 to 7,000 per day.

Three things help me to reach this goal.

1. Pedometer, a step counter device. The device gave me an idea of pathetic daily walking steps. Results obtained are used to create the baseline or what adjustment needed to reach 10,000 steps.

2. Blood test. You need to be very concern when certain measurement have reached borderline figures.

3. Great walking shoes. Major motivator. My feet never felt so comfortable and pamper. What was I wearing all my life ? Save money and get yourself good pair of walking shoes, your feet deserved it. Also, a good pair of socks as well.

What's my next steps ? Try to walk 10,000 steps daily and also getting a better measuring devices. Still pondering between Fitbit Flex, Nike+ Fuel, and Jawbone Up.

Programmer Mortality

"Damn, there's a lot of early software people death lately..."
-- ttrreeww
That's so true. Went to his Python talk last month and now he's gone. Another painful reminder that life is so fragile. Surprised to find out (unconfirmed) he was in his early 40s and that just a few years older than me.

On a related note, few weeks ago, I ran into an old acquaintance. As we were catching up, I noticed he looked different, way thinner and tired. To my surprise, he told me he was recently diagnosis with certain illness. I was taken aback by this but was not surprise, knowing his caffeine-high life style.

I told Ms Hippopotamus about this and finally convinced (frightened and threatened) her to go for blood testing with me after so many years. Test results came back were unsatisfactory and there will be follow up session for confirmation. No, is not life threatening but alarming. I can probably guess what going to kill us in coming future if we don't make any adjustment to our lifestyle. Sitting all day will slowly kill you.

Just take care of yourself. Walk more, sit less. "Eat food. Not too much. Mostly plants." (Michael Pollan, food writer). Or maybe I should stop being a software developer and do something else instead? Something that involves more manual labours like farming?

PayPal Payment Status Stuck in Pending Due to Unilateral

Took me fricking 4 hours of repeating trial and error and finally found the cause on why PayPal keep keep giving me result of
payment_status: Pending
pending_reason : unilateral

Until I read the actual documentation (note to self: pay attention and read the damn documentation) on the meaning of unilateral:

The payment is pending because it was made to an email address that is not yet registered or confirmed.

Then it occurred to that I have just created the new merchant account and
the email address is not confirmed yet. Once I confirmed it (see screenshot), the payment status changed to
payment_status: Completed
pending_reason (this field was not returned)

This frustrated me a lot since I remembered I solved the same damn issue few months ago. Damn frustrating.



Still remember the new PayPal developer portal mentioned few days ago? Basically is just old wine in new bottle, the interface was simplified but the experience of handling the sandbox still as frustrating and confusing.

When will Stripe or Paymill (clone of Stripe) will ever come to Southeast Asia? Why can't we (developers) have a simple, stable, and secure payment gateway?

Using GeoIP with PHP - Part 4

It was a fruitful journey trying to install the GeoIP extension. These are some lessons learned from all three previous posts.

If you’re running Ubuntu in live production server, stick to Long Term Support (LTS) distro. Be very conservative, similar to RPM-based distro. Use CentOs and not Fedora for production.

Do not trust the stability of the extension that came bundled from the distro. Ubuntu/Debian always a bit lag behind with the latest fixes. Install your PHP extension using PECL.

Always test your installed PHP extension using the test cases that came together in the source code. The greatest benefit is all these test cases will show the stability of the extension and the correctness of your PHP configuration. Starting from now, any new LAMP installation will be tested with the sample test cases in the PHP source.

Read the bloody changelog. Google the proper keywords. It never occurs to me to google the culprit method name, geoip_db_get_all_info().

Learn some C and how to debug with gdb. I still don’t quite understand C and can’t really backtrace the issue back to the original line of the code. Something to improve upon in the coming future.

I have been using PHP professionally for many years and yet there are still a lot to be learned.

Using GeoIP with PHP - Part 3

Continue from part 3. We will try another approach using plain PHP library file instead of apt-get or pecl as shown in Part 1 and 2. This method is suitable if you don’t have full or root access to your machine. Example is using shared hosting.

Clone the geoip api library.
$ git clone https://github.com/maxmind/geoip-api-php

Try run one of example script.
$ cd geoip-api-php
$ php -f sample_city.php
PHP Fatal error:  Cannot redeclare geoip_country_code_by_name() in /home/kianmeng/project/geoip-api-php/geoip.inc on line 439

Yup, method name conflict. In PHP, all methods are in global namespace, hence the name collision. Which is why they introduce ugly-hack of namespace in PHP to solve this problem. We have to remove our PECL-installed geoip extension in Part 2.
$ sudo pecl uninstall geoip
$ sudo rm -rf /etc/php5/conf.d/geoip.ini

Download and uncompress the sample data file.
$ wget http://geolite.maxmind.com/download/geoip/database/GeoLiteCountry/GeoIP.dat.gz
$ gunzip GeoIP.dat.gz

Create this sample test code (test_geoip.php).
<?php
require_once "geoip.inc";

$gi = geoip_open("GeoIP.dat", GEOIP_STANDARD);
echo geoip_country_code_by_name($gi, "php.net"), "\n";
geoip_close($gi);

Run the script.
$ php -f test_geoip.php
US

Another similar sample test code (test_cached_geoip.php) but using memory caching.
<?php
require_once "geoip.inc";

geoip_load_shared_mem("GeoIP.dat");
$gi = geoip_open("GeoIP.dat", GEOIP_SHARED_MEMORY);
echo geoip_country_code_by_name($gi, "php.net"), "\n";
geoip_close($gi);

Both the geoip extension and geoip library cannot be used interchangeably. The extension is newer and more update-to-date. If you need a quick IP-to-country lookup but no root access to your machine, just use the library.

PayPal's REST API

Welcoming changes from PayPal. Looking forward to the new REST API which is unfortunately only for US-developers only. Also the new simplified looks of the developer documentation (inspired by Stripe, read this if you want to know what the fuss about Stripe). They should have bought the Stripe instead. Going to try this out in coming weeks.

RedHat/CentOS vs. Debian/Ubuntu

Interesting discussion on experiences using Centos/Redhat, Ubuntu, and Debian. Funny nobody mentions OpenSuSe, Gentoo, or Arch on any serious server deployment. Summary of the discussion.

1. Centos/Redhat. High-performance computing (HPC), scientific stuff, Oracle databases, or Java-related stuff.

2. Ubuntu. Mostly desktop. Latest and greatest stuff. The (Long Term Support) LTS is crappy for HPC.

3. Debian. Server stuff or a more stable and conservative Ubuntu (LTS).

To be fair, for typical LAMP-stack server, all above mentioned GNU/Linux distros are good enough. But Ubuntu/Debian are far more convenient to get more latest greatest packages. 

Extract Hyperlinks Using Python and PHP

Is always a great fun if you can rewrite certain code from one programming language to another. I was looking at this short snippet of Python code by unconscionable which request a page and dump certain comments links.

Sample Python code reproduced here.
import urllib2, re
headers = {'User-agent': 'I promise I\'m not doing this a lot',}
req = urllib2.Request("http://www.reddit.com/r/BuyItForLife/search?q=headphones&restrict_sr=on", None, headers)
website = urllib2.urlopen(req)

html = website.read()

links = re.findall('"((http|ftp)s?://.*?)"', html)
for i in links:
    if 'http://www.reddit.com/r/BuyItForLife/comments/' in i[0]:
        print i[0]

My rewrite using PHP using file_get_contents and stream_context_create, something new for me.
<?php

$options['http']['header'] = "User-agent: I promise I'm not doing this a lot'\r\n";
$context = stream_context_create($options);

$url  = "http://www.reddit.com/r/BuyItForLife/search?q=headphones&restrict_sr=on";
$html = file_get_contents($url, TRUE, $context);
preg_match_all('/"((http|ftp)\s?:\/\/.*?)"/i', $html, $links);
foreach ( $links[1] as $link )
{
    if ( strstr($link, 'http://www.reddit.com/r/BuyItForLife/comments') )
        echo $link, "\n";
}

Comparison of both code snippet.
  1. Regex is simpler and more readable in Python. You don’t need to escape certain character (example is forward slash /) like in PHP. API is simpler and make more sense, result are returned instead of using callback in PHP where you have two sets of array.
  2. file_get_contents() is awesome and dangerous as well for reading both offline and online file. Nothing equivalent is found in Python.
  3. Finding and matching string is way more readable in Python.

Javaism in PHP Again.

Via HN. The Java-style absurdity that slowly creeping in to PHP ecosystem. History will repeat itself again. I have this feeling that PHP will one day end up like the programming language it dethroned few years back.

Rasmus (the creator of PHP) needs to step in as the Benevolent Dictator for Life (BDFL) instead of community" voting on setting the future direction of PHP. The language itself is slowly losing its identity and its PHP-way. Stop being a poor imitator of Java.

Using GeoIP with PHP - Part 2

Continue from part 1. It seems the default php5-geoip package is unstable and core dumped on me. Let’s try the PHP Extension Community Library (PECL) installation method.

Let’s remove the existing php5-geoip package.
$ sudo apt-get --purge remove php5-geoip

Install the necessary packages for PECL installation and compilation.
$ sudo apt-get install php-pear php5-dev libgeoip-dev geoip-database

Download, compile, install, and enable the geoip extension. The tee command is to solve the sudo file permission error.
$ sudo pecl install geoip
$ echo "extension=geoip.so" | sudo tee -a /etc/php5/conf.d/geoip.ini
$ sudo service apache2 restart

Double check the geoip extension is really loaded properly
$ php -m | grep geoip
geoip
$ php -r "echo extension_loaded('geoip');"
1

Test the method that gave us core dump in Part 1. No core dump and the geoip extension is stable using PECL installation method. Why?
$ php -r "print_r(geoip_db_get_all_info());"
Array
(
    [1] => Array
        (
            [available] => 1
            [description] => GeoIP Country Edition
            [filename] => /usr/share/GeoIP/GeoIP.dat
        )
......

Weird, what the difference between default Ubuntu package and PECL? Let’s check the version.
$ apt-cache show php5-geoip | grep Version
Version: 1.0.7-8

$ pecl info geoip | grep "Release Version"
Release Version       1.0.8 (stable)

So it seems both are different version. Let’s read the latest changelog which shows us one very interesting fix.
* Fix segfault with newer geoip libraries and geoip_db_get_all_info() (bug #60066)

What is bug #60066? The reason why PHP segfault is GeoIPDBFileName is not defined. Still remember the backtrace and apport crash report title line “Title: php5 crashed with SIGSEGV in add_assoc_string_ex()” we did in Part 1. Read the fix in the Subversion.

Download the source code for php5-geoip from the Ubuntu repository.
$ apt-get source php5-geoip
$ tree -L 1 .
.
├── php-geoip-1.0.7
├── php-geoip_1.0.7-8.debian.tar.gz
├── php-geoip_1.0.7-8.dsc
└── php-geoip_1.0.7.orig.tar.gz

1 directory, 3 files

Prepare the extension for compilation but run the test cases instead.
$ cd php-geoip-1.0.7/geoip-1.0.7/
$ phpize5
$ php -f run-tests.php
ERROR: environment variable TEST_PHP_EXECUTABLE must be set to specify PHP executable!

$ whereis php
php: /usr/bin/php /usr/bin/X11/php /usr/share/php /usr/share/man/man1/php.1.gz

$ export TEST_PHP_EXECUTABLE=/usr/bin/php
$ php -f run-tests.php
......
FAILED TEST SUMMARY
---------------------------------------------------------------------
Calling geoip_db_filename() with a non-existant database type within bound. [tests/008.phpt]
Calling geoip_database_info() with a non-existant database type within bound. [tests/011.phpt]
Checking timezone info with (some) empty fields [tests/014.phpt]
......

Since we install the geoip extension using PECL, is better to run the test cases that came with that same version as well. Download the source, configure, and run tests.
$ wget http://pecl.php.net/get/geoip-1.0.8.tgz
$ tar zxvf geoip-1.0.8.tgz
$ cd geoip-1.0.8
$ phpize5
$ export TEST_PHP_EXECUTABLE=/usr/bin/php
$ php -f run-tests.php
......
FAILED TEST SUMMARY
---------------------------------------------------------------------
Checking timezone info with (some) empty fields [tests/014.phpt]
......

Using GeoIP with PHP - Part 1

For the pass few months, I noticed there seems a large number of forum spam bots scanning for loophole on something that I am working on. While we managed to capture all these IP addresses and blocked them, I was curious about the locations (countries or cities) of these IP addresses. The only way is to add country-to-IP resolver or GeoIP support. Below are the installation steps in Ubuntu 12.10.

Install the GeoIP extension and the need database (only can find IP by country)
$ sudo apt-get install php5-geoip geoip-database

Check our installation
$ php -m | grep geoip
geoip

Check the available databases.
$ php -r "print_r(geoip_db_get_all_info());"
Segmentation fault (core dumped)

However, default packages seemed to be buggy. Let’s do a backtrace.
$ strace php -r "print_r(geoip_db_get_all_info());"
......
stat("/usr/share/GeoIP/GeoIP.dat", {st_mode=S_IFREG|0644, st_size=1773423, ...}) = 0
stat("/usr/share/GeoIP/GeoIPv6.dat", {st_mode=S_IFREG|0644, st_size=1226717, ...}) = 0
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV (core dumped) +++
Segmentation fault (core dumped)

Unfortunately there is no core dump file. Why? No worry, let’s check the core dump default behaviour.
$ cat /proc/sys/kernel/core_pattern | /usr/share/apport/apport %p %s %c

Apport ? Yes, The automatic crash report generator in Ubuntu. Let’s read the crash reports which are located in /var/crash. Get the title or summary of the report so we can google for answer.
$ cd /var/crash
$ cat _usr_bin_php5.1000.crash | grep Title
Title: php5 crashed with SIGSEGV in add_assoc_string_ex()

Google around for the title message, no luck, nothing related to geoip extension. Back to the question, where is the core file? Let’s try checking the ulimit
$ ulimit -c
0

Ok. Let’s change that to unlimited and rerun the sample code again.
$ ulimit -c unlimited
$ ulimit -c
unlimited

$ php -r "print_r(geoip_db_get_all_info());"
$ ls -l core
-rw-r----- 1 kianmeng kianmeng 15409152 Mar  9 20:07 core

Now let’s backtrace it using gdb, the GNU debugger.
$ gdb /usr/bin/php core
(gdb) bt
#0  __strlen_sse2_pminub () at ../sysdeps/x86_64/multiarch/strlen-sse2-pminub.S:39
#1  0x00000000006bc17e in add_assoc_string_ex ()
#2  0x00007fb7f6404bda in zif_geoip_db_get_all_info () from /usr/lib/php5/20100525/geoip.so

Cisco VPN in Ubuntu

$ sudo apt-get install network-manager-vpnc

Just a simple one-line installation command and the Virtual Private Network (VPN) client seems to work almost perfectly. However, you must save both user and group password and not individually. Example if you just save group password and not user password, the connection client will prompt you for both password even the group password was saved. One of those weird and frustrating annoying bug. 
"With Elefant, I chose to use method names with under_scores instead of camelCase. I find they substantially improve readability, and while PHP moves ever-closer to Java in its syntax, I strongly prefer the direction Python, Ruby, et al take, and find underscores better match PHP too because every function in PHP uses them. So at this point I’m not renaming everything, or worse, using aliases so both ways can be supported. Instead I’m opting out of PSR-1."
-- Johnny Broadway, emphasis mine

Javaism in PHP

What an interesting piece of code showcasing two major features in PHP namely namespacing and annotation (not really a core feature but common in those Javaism PHP frameworks). Influenced by Java and implemented "superbly" in PHP. Best decision for PHP going forward.
/**
 *@var\Doctrine\Common\Collections\Collection<\TYPO3\Blog\Domain\Model\Post>
 *@ORM\OneToMany(mappedBy="blog")
 *@ORM\OrderBy({"date"="DESC"})
 */
protected $posts;

How do you replace every characters in a string with asterisk except the first and last characters?

To be more specific, how to you obfuscate an email address partially but still make it recognizable ? An example is john.doe@example.com becomes j******e@example.com.

Regrex or preg_replace to perform a regular expressions search and replace. Inspired by a regex example.
php > $regex = "/(?<!^)\S(?!$)/";
php > echo preg_replace($regex, '.', 'johndoe'), "\n";
j.....e
php >

Break down of the regex and explanation of each part.
(?<!^)  # not the first character
\S       # matches any string but a whitespace
(?!$)    # not the last character

Both (?<!pattern) and (?!pattern) are look-behind and look-ahead assertion. Assertion are test on characters before or after the matched pattern. In our case here, we want all characters that is not in the first and last position.

More details breakdown and explanation using explain, a regular expression online tools. Result as shown:
(?lt;!  # look behind to see if there is not:
^      # the beginning of the string
)       # end of look-behind
\S     # non-whitespace (all but \n, \r, \t, \f, and " ")
(?!     # look ahead to see if there is not:
$       # before an optional \n, and the end of the string
)        # end of look-ahead