Subversion Server Installation and Setup in CentOS 5.x

Is so fricking hard to find latest greatest official packages for CentOS. For example, the latest version of Subversion in CentOS 5.8 is 1.6.11. If you want the 1.7.x version, you have to either download from third party repository like wandisco, upgrade to CentOS 6.x, or just compile own your own. I always wary with unofficial rpm packages especially those that you can’t verify. Unless I need to set up something fast or just plain lazy, I will just go ahead and download these rpms. Direct upgrade from 5.x to 6.x is not supported and it is advisable to do a fresh install. Compile from source, been there, done that, not sure what I am doing. While I understand the CentOS team is severely lacking resources, how I wish there is something similar to Ubuntu PPA for CentOS.

Enough ranting. Let’s proceed with the installation. The installation steps are based on two very helpful guide, the official and unofficial. As I am currently stuck with 5.X and I badly need the centralized metadata storage (just one .svn hidden folder) in 1.7.X.

Download all the necessary packages and install them. Make sure you have remove the existing subversion packages. All command is run as root.
$ yum remove subversion*
$ yum install httpd
$ rpm -ivh subversion-1.7.7-1.x86_64.rpm
$ rpm -ivh mod_dav_svn-1.7.7-1.x86_64.rpm
$ rpm -ivh subversion-perl-1.7.7-1.x86_64.rpm
$ rpm -ivh subversion-tools-1.7.7-1.x86_64.rpm
$ rpm -ivh subversion-python-1.7.7-1.x86_64.rpm

Backup the original Subversion Apache module configuration file.
$ cd /etc/httpd/conf.d
$ cp subversion.conf subversion.conf.orig

Edit the configuration file (subversion.conf) as follows:
LoadModule dav_svn_module     modules/mod_dav_svn.so
LoadModule authz_svn_module   modules/mod_authz_svn.so

DAV svn
SVNListParentPath on
SVNParentPath /repo

SSLRequireSSL
AuthzSVNAccessFile /repo/acl
AuthType Basic
AuthName "Repositories"
AuthUserFile /repo/passwd
Require valid-user

Create the repository folder.
$ cd /
$ mkdir repo
$ cd repo

Create the password (passwd) with a sample user (melanie).
$ htpasswd -m passwd melanie
New password:
Re-type new password:
Adding password for user melanie

Create a sample ACL file with content of
[/]
melanie = rw

Create a sample project and change all folder permission to the apache user.
$ svnadmin create sample
$ chown apache.apache -R /repo
$ service httpd restart

Browse to https://localhost/repo/ or https://localhost/repo/sample. The server should prompt you for username and password.

Unity Dash Online Video Search in Ubuntu 12.10

I was learning how to use ngrep, a command line tool that let you search and filter all the network traffic coming in and out from our machine. While testing to see what being transfer from and to port 80 ( see command below), I notice my newly upgraded Ubuntu 12.10 keep on making HTTP request to videosearch.ubuntu.com.
$ sudo apt-get install ngrep
$ sudo ngrep -d any port 80

It seems, default settings in Ubuntu 12.10 enables unity video lens to periodically connect to this server to search information for video recommendation. It is advisable to remove or disable this unused and unnecessary feature. Just type this command and logout and re-login your session.
$ sudo apt-get --purge remove unity-scope-video-remote

I was curious of the content in the JSON file. So we going to make a manual HTTP query to capture the file. Note that direct browser query will return empty result (suspect user agent checking) so we have to use curl or wget.
$ curl -o videosearch.json http://videosearch.ubuntu.com/v0/search?q=&sources=Amazon

The returned JSON file contains a list of YouTube videos. Changes to the request parameters(q and sources) returns almost similar result. Nothing interesting here. Let's inspect the HTTP response header instead.
$ curl -D header videosearch.json http://videosearch.ubuntu.com/v0/search?q=&sources=Amazon

Note that I removed the timestamp and added some comments
$ cat header
HTTP/1.1 200 OK
Date: .....
Server: gevent/0.13.0 gunicorn/0.13.4

The response is coming from gunicorn, a lightweight Python WSGI HTTP server using gevent, a coroutine Python networking library.
Vary: X-Geo-Country

What HTTP request header fields (in this case X-Geo-Country) to be used in caching decision either to regenerate from application server or reload from proxy server.
Content-Type: application/json
Content-Length: 10309
Expires: ......

Age: 203

The age in seconds of the JSON file in the proxy cache.
X-Cache: HIT from alkes.canonical.com
X-Cache-Lookup: HIT from alkes.canonical.com:3128

Requested content is found in the cache (HIT) of the caching server.
Via: 1.0 alkes.canonical.com:3128 (squid/2.7.STABLE7)

Squid, popular web caching server.
Via: 1.1 videosearch.ubuntu.com
Connection: close

The HTTP request has gone (back and forth) through these two servers. Note that alkes.canonical.com is Ubuntu music search API server.

Upgrade Subversion Client to Version 1.7 in Ubuntu 12.04

Before upgrade your local working copy, you’ll need to clean up your working directory to prevent any errors during upgrade.
$ svn cleanup .

Upgrade your Subversion client from 1.6.x to 1.7.x in Ubuntu 12.04
$ sudo apt-add-repository ppa:svn/ppa
$ sudo apt-get update
$ sudo apt-get dist-upgrade
$ svn --version
svn, version 1.7.5 (r1336830)
   compiled Jun 26 2012, 22:39:53

Upgrade your local working copy
$ svn upgrade .
$ tree -L 1 .svn
.svn
├── entries
├── format
├── pristine
├── tmp
└── wc.db

Didn’t realize this until I read in the [documentation][3],
"However, some of the new 1.7 features may not be available unless both client and server are the latest version. There are also cases where a new feature will work but will run less efficiently if the client is new and the server old."
Now I understand why my local working copy does not has version 1.7 features. Both client and server must has the latest version. Question now is how to upgrade those ancient old client in CentOS ?

Parsing HTML Document Using PHP Native Extensions

While there exists many third party libraries to parse and process HTML documents, these libraries are too bloated when you just need to write a simple single file script. Hence the question is it possible to parse HTML using native built-in PHP core extension ? Yes, through DOM and DOMXPath. However, it will take a while before you’re familiarize with both APIs.

Below is a sample code to download, parse, and print all links from reddit main page. Note that when parsing HTML5 document, you will encounter "Tag time invalid in Entity" warning as DOM will default to HTML4 Transitional DTD which does not contains newer HTML tag. Just use @-operator to suppress the warning.

<?php
$html = file_get_contents("http://reddit.com");
$dom  = new DOMDocument();
@$dom->loadHTML($html);

$finder = new DOMXPath($dom);
$links  = $finder->query('//a[@class="title"]');
foreach ( $links as $link )
{
    echo $link->nodeValue . "\n";
    echo $link->getAttribute("href") . "\n\n";
}

Find Top-Level Domain (TLD) With MySQL

I was stuck with a problem of extracting Top-Level Domain (TLD) from a column in MySQL database. TLD is the last part of the domain name, example is the .com in google.com. As TLD varies in size, you may have you have .io (popular with startup), .com, or .name, how to you extract this part out without using any code but only SQL? Why SQL? Because it’s easier to group domain by TLD using GROUP BY clause.

Luckily, MySQL’s SUBSTRING_INDEX function can solve this problem easily. As the function name implies, extract a subset of a string into an index. First, let’s look at the function definition and parameters.

SUBSTRING_INDEX(str, delim, count)
  • str, the string we want to perform the action
  • delim, the delimiter we want to break the string into an index
  • count, maximum number of item from left (if positive), or from right (if negative) return

Examples:
mysql > select substring_index('111.222.333.444.555', '.', '1');
-> 111

mysql > select substring_index('111.222.333.444.555', '.', '2');
-> 111.222

Note : Negative count return result from the right
mysql > select substring_index('111.222.333.444.555', '.', '-1');
-> 555

mysql > select substring_index('111.222.333.444.555', '.', '-1');
-> 444.555

Back to my original question, how do we extract different TLD from a domain name?
mysql > select substring_index('www.google.io', '.', '-1');
-> io

mysql > select substring_index('www.google.com', '.', '-1');
-> com

mysql > select substring_index('www.google.name', '.', '-1');
-> name

Unfortunately, this only work with first level TLD and not second or later level like www.yahoo.co.jp where the TLD is co.jp.

Removing Staircase Effect in Vim

Name one of the repetitive manual work in your life right now?
I always have to toggle between paste and nopaste mode in vim to prevent misaligned source code indentation. This is also known as staircase effect.

Staircase effect? Tell me more about it.
Vim will assume all paste text is typed to the editor rather than paste. If autoindent is on, the editor will reindent the whole text again as illustrated by below screenshot.


Ok. How you going to reduce the manual typing?
Configure vim to use keyboard shortcut to reduce typing as described by this vim tip. In your .vimrc file, add these options.
" switch between paste and no paste mode fast and go into insert mode
" after that
let mapleader = ","
nnoremap p :set invpaste paste?i
set pastetoggle=p
set showmode

What the improvement?
Number of key press to copy and paste
Without keyboard shortcut : 21
With shortcut : 4


Retrospection on Using Kohana PHP Framework

Some retrospection on an old project done using Kohana PHP framework. There are quite a few things that I wish can be done differently. After some googling to identify the exact version of the framework, the project was running on 3.0.X.

The web application can be made more secure by using a better folder structure similar to customized layout for CodeIgniter. The main index.php file should be put into separate public folder or document root folder. Structure as shown:
├── application 
├── modules 
├── system 
├── public_html 
│   └── index.php

All assets files (css, js, and img) should be also put under a folder called assets and under public_html as well.
├── public_html 
│   └── assets
│       ├── js
│       ├── css
│       ├── img

We can shorten the development time of the project if Twitter’s Bootstrap was chosen as the CSS framework. Quite a lot of time was spent on tweaking the user interface and in the end, the web application still look quite butt-ugly.

Should have use Mercurial instead of Git. I was starting to learn DVCS and Git is not a proper choice for someone from Subversion background. The experience is excruciating painful and stupid as well. The reason we went for git because in term of pricing, Github is cheaper than Bitbucket for private project. But now, bitbucket is cheaper, how can you compare to unlimited free repositories?

Both authentication and authorization were hacked up work. Very rigid and painful to modify as well. Access Control List (ACL) is a damn hard to do it right.

Picking Kohana was a wrong choice at that moment. As the framework has gone through several rewrite, documentation is limited and most of the time you have to refer to the code itself. I should have picked Codeigniter and DataMapper ORM combination instead, at least I know now the development time can be shorten. This does not mean that Codeigniter is better, just more stable and more documentation available.

What’s next? Going to upgrade or rewrite the web app to the latest version 3.2.x and try to do things differently this time. I also have to relearn Kohana again as it has been awhile since I last looked at it.

LPTHW - Day 67 Else and If

There are no Day 65  and Day 66. Busy with family stuff.

Doing exercise 30 of LPTHW. Nothing fancy but just if else if statement. Still try to get used to conditional statement in Python without open and close bracket. Nothing worth mentioning about this exercise, just some typing exercise for me.

LPTHW - Day 13 Strings and Text

No Day 12 , I fall asleep on my lappy while going half-way through exercise 6 in LPTHW .

Multi-variables formatted string in Python 2.x
>>> print "No %s, No %s" % ('Pain', 'Gain')
No Pain, No Gain

In  Python3, using positional (ordinal) arguments
>>> print ("No {0}, No {1}".format('Pain', 'Gain'))
No Pain, No Gain

or named arguments, which I loves alot since it make the code readable at a glance.
>>>print ("No {first}, No {second}".format(first='Pain', second='Gain'))
No Pain, No Gain

While in PHP
php> printf("No %s No %s", 'Pain', 'Gain');
No Pain No Gain

String Concatenation in Python
>>> print "foo"+"bar"
foobar

In Python 3
>>> print ("foo"+"bar")
foobar

Meanwhile in PHP
print "foo" . "bar";

LPTHW - Day 07 Comments And Pound Characters

There is no day 6, my house was "attacked" by apples, oranges, and pear.

Doing the second exercise in LPTHY book. I learned that the symbol # can be named differently depending on where or who you are. And from where I came from, according to Wikipedia, we called it hex? Seriously, I think is hash instead.

Just remember, to American and some Canadian, is pound and to everyone else, is hash. But in the book, Zed uses the name octothorpe, which is commonly used in Bell Lab's telephone system. By the way, you pronounce it as "ok-tuh-thawrp"

In the extra credit section, we were asked "review each line going backwards". Why backwards? As we normally don't read backward, it will slows you down and help you to focus and catch mistakes.

Also, we were asked to "read what you typed above out loud, including saying each character by its name.". By doing so, you can easily catch mistakes and reinforce what you've coded. To make this even more fun, why not ask someone else to read for you? Let's get our good old espeak.
$ sudo apt-get install espeak
$ espeak -f ex2.py

If you notice, learning a new programming language is like learning to read and write a foreign language. All the foreign language learning techniques can be applied here as well.

LPTHW - Day 04 A Good First Program

tl;dr : print is a statement in 2.x but in 3.x, print is a function.

As a developer with some experience with development, one will have the tendency of skipping the "hello, world" example. Nope, not going to happen here, I am going to do everything step by step. As Shunryu Suzuki once said,
"In the beginner's mind there are many possibilities, in the expert's mind there are few."
Being experienced and having an expert mentality make it harder to absorb new ways of doing stuff and unlearn all the bad habits. Do it like a beginner and no cut-and-paste.

Typed and ran the everything in below.

ex1.py
print "Hello World!"
print "Hello Again"
print "I like typing this"
print "This is fun."
print 'Yay! Printing.'
print "I'd much rather you 'not'."
print 'I "said" do not touch this.'

Works fine except when I ran it with Python3 and got this error message.
$ python3 ex1.py
File "ex1.py", line 1
print "Hello World!"
^
SyntaxError: invalid syntax

It seems to work in 2.6 but failed in 3 ? So what's going on here? According to the new features in Python3, print which used to be a statement in 2.x but now is a function in 3.x. Why ? According to Georg Brandl's rationale, as I read it, a function behaviours are easier to extend and overwrite compare to a statement.

So, change the Ex1 by adding a parenthesis around the strings we want to print as follows and save it as ex1-python3.py. Then you will get errorless output.

ex1-python3.py
print("Hello World!")
print("Hello Again")
print("I like typing this")
print("This is fun.")
print('Yay! Printing.')
print("I'd much rather you 'not'.")
print('I "said" do not touch this.')

On a side note. There is no Day 3. Shits happens.

What if you want to stick to the Python 3 print function but still want to run it using version 2.x ? You can do this using future module which let's backport certain Python 3 features to Python 2. For our example for using print() function, we just need to import this function to our code as shown below. Thanks to kamal for this tip.
from __future__ import print_function

LPTHW - Day 02 Installing Python 2 and 3 in Ubuntu 10.10

By default, the Python version in Ubuntu 10.10 is 2.6. Unless necessary, upgrading to version 2.7 is is not compulsory. But still, we want to explore version 3. Installation procedure as follow:
$ sudo apt-get install python3-minimal
$ python3

$ python3 -V
Python 3.1.2

Default python installation
$ python -V
Python 2.6.6

LPTHW - Day 01 Picking Python as New Learning Programming Language

If you have been reading the book "The Pragmatic Programmer", both authors suggested that as a programmer who wants to expand their knowledge, you should "learn at least one new language every year". I will give it another shot by trying to really settle down and learn a new programming language this year. No more lame excuses.

So which language? Python. Why Python? Because it's not PHP. Why not Ruby? Because P > R ! Joke aside, either one is okay.

I will start with Zed Shaw's "Learn Python the Hard Way" as well as "The Hitchhiker’s Guide to Python!" to get the fundamental right. Later, hopefully will try to create something non-web based from it to further my understanding of the language and the ecosystem.

Many years ago, Peter Norvig wrote that you need ten years to master a programming language. Malcolm Gladwell in his book Outliner further support this claim by narrowing it down to 10,000 hours for anyone to become a expert in any field. How long it going to take me to become a Python expert ? 54-plus years. Yes, fricking 54-plus years if I spend half an hour a day to practice and using it everyday. Pretty damn long right?

But wait. Who is a Python expert? How do you define a person who is an expert in Python? And the most important question is, does it really matters ? No. Why not just enjoy the learning process and have fun creating, exploring, and experimenting. Make lots of mistake and fails miserably. No one is going give a damn whether you're an expert or not.

By the way, happy new year.