Find and Delete All Duplicate Files

I was asked about this question today but can't seem to think of a quick answer to solve this issue. Typical manual solution is to just compare the file size and file content through hashing or checksum.

It seems there are quite a number of duplicate file finder tools but we will try with a console tool called fdupes. Typical usage of this program.

1. Install the program.
$ sudo apt-get install fdupes

2. Create sample testing files.
$ cd /tmp
$ wget -O a.jpg
$ cp a.jpg b.jpeg
$ touch c.jpg d.jpeg

3. Show all duplicate files.
$ fdupes -r  .


4. Show all duplicate file but omit the first file.
$ fdupes -r -f .


5. Similar to step 4 but delete the duplicate files and keep one copy.
$ fdupes -r -f . | grep -v '^$' | xargs rm -v
removed `./b.jpeg'                      
removed `./d.jpeg'

On a similar note, there is this interesting read on optimized way by Dupseek, an app that find duplicate files. The main strategy is just group these files by size and start comparing them by set and ignore those set with just one file.

Unfortunately, I've a hard time understand the Perl code. The closet and simplest implementation I can found is the tiny find-duplicates Python script.

Which Is More Readable or Preferable?

Going through my daily subscribed feeds, I've notice both the monitors were displaying two ebooks, published to the web differently. On the left, is the The Feynman Lectures, and the right, Learn Web Development: The Ruby on Rails Tutorial. Both are rendered using Georgia font but at different size and layout.

If you ask me, I'll prefer the left screenshot. Straight-forward, clean, good contrast, and make the best use of the layout. Good example of a good hypertext document in the Web.

PHP 5.6 New Features

It has been a while since I last code anything significant in PHP, but this 5.6 release do have quite a few significant features. Is this the final stable version before we all move to PHP 7?

The only feature I really like, the support of variable-length argument lists through variadic functions and argument unpacking. Example shown below.
function sum(...$numbers) {
    return array_sum($numbers);

$nums = [1, 2, 3, 4];
echo sum(1, 2, 3, 4), "\n"; // 10
echo sum(), "\n";           // 0
echo sum(...$nums), "\n";   // 10
echo sum($nums), "\n";      // 0

On additional changes to the namespaces. While the use operator currently support importing functions, constants, or classes, it still very much limited or rather half-baked when compare to Java or Python. For example we still can't use wildcards for mass import or shorter syntax in importing selected names as shown.
// no working.
use MyProject\Feature\*;
from FooLibrary use Foo, Bar, Baz;

Also, don't get me started on the whole backslash (\) as separator for namespace. I cringe every time thinking or looking at it. Sigh.

phpdbg, which is something new to me, was included and implemented as Server Application Programming Interface (SAPI) module. Unfortunately I can't get it to work, going to try it out in another post.


" approach to developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API. These services are built around business capabilities and independently deployable by fully automated deployment machinery. There is a bare mininum of centralized management of these services, which may be written in different programming languages and use different data storage technologies."
-- James Lewis & Martin Fowler, emphasis added
The term have been lingered in my mind for the past two weeks but I didn't pay much attention to it until today. Yes again, another gimmicky development term which seems to be another a rebranding of Unix philosophy and simplified version of Service-Oriented Architecture (SOA). Sigh, the side effect of the trending butt, ahem, cloud technology these days.

How to implement this architecture style? Decompose and move each component in your monolithic system into its own service. Each service can be development using any platforms, programming languages, or data storages but communicates through JSON over HTTP or lightweight messaging bus. In short, a change of the communication style between each components from function calls to messages.

Nothing new here, old wine in a new bottle.

Android Java

"Android - really the biggest reason today why anyone besides the enterprise guys cares about Java anymore - is well down this dark dark road as well. It’s increasingly common to read a page of Android API documentation and have no idea what the fuck it’s talking about initially. You get there eventually of course, you just have to take a detour through 17 other classes. What, you can’t handle that? You obviously lack the perseverance and vision to conceptualise this grand cathedral that has been built to populate a list. Loser."
-- Neil Sainsbury, additional emphasis added.
You have to agree that Java still relevant today due to the popularity of Android. Otherwise it will headed to the same fate as Cobol or 4GL as well, in the enterprise world.  But off course, Enterprise Java and Android Java are two different beasts.

Static Variable in Python

The solution, as shown below, is actually quite straight forward. Basically by using Python attribute, you can emulate static variable in a function without using a global variable.
def myfunc():
if not hasattr(myfunc, "counter"):
    myfunc.counter = 0  # it doesn't exist yet, so initialize it
    myfunc.counter += 1

Coming from C-based programming languages, PHP in my case, it going to take a while for me to adapt to Python. Yes, I still code Python like a PHP, hence, more practice needed.

Python and Makefiles

"Besides building programs, Make can be used to manage any project where some files must be updated automatically from others whenever the others change."
-- Wikipedia on Make
One of the issue I've noticed while doing a Django project was there are a bunch of shell scripts in the project folder. These tiny shell scripts were mostly related to creating virtualenv, resetting database, and others. Would that be nice if we can combine and group all these scripts into one file?

That is possible and easy through Make and makefiles, provided that you're willing to pick up the rules. The only minor annoyance is those running Windows need to install Make for Windows.

Something I learned while working on my Makefile.

1. A rule tells the make program on how to build your program. The syntax, as shown below, is straight forward and consists of a target, dependencies, and commands.
target: dependencies/target

2. Prepend the build target name with '_' if you don't want the build targets to show up autocompletion in your shell.

3. Build script too noisy or verbose? For example, printing of working directory. To silence it, use make -s or prepend '@' before the commands.

Finding and Deleting Files, xargs rm vs find -delete

Interesting comparison of finding and deleting files using both xargs and find command.

Create 10k files with 10 bytes each.
$ mkdir /tmp/test
$ dd if=/dev/zero of=masterfile bs=1 count=1000000
$ split -b 10 -a 10 masterfile

Using xargs.
$ time find -name 'xaa*' -print0 | xargs -0 rm
real    0m7.667s
user    0m1.112s
sys     0m6.491s

Using find with -delete option.
$ time find -name 'xaa*' -delete
real    0m7.252s
user    0m0.954s
sys     0m6.023s

Time difference of 0.415s, which is just insignificant. However, the -delete method way easier to remember.

Noto Sans CJK

"Noto Sans CJK is a sans serif typeface designed as an intermediate style between the modern and traditional. It is intended to be a multi-purpose digital font for user interface designs, digital content, reading on laptops, mobile devices, and electronic books."
I've been reading a lot of Chinese text these days but the text displayed by the default CJK fonts are awful. When Google release the Noto Font, I was hoping that it can improve the readability, sadly, the result still remain the same.

Installation is quite straight forward.
1. Download the Simplified and Traditional Chinese fonts from Google's Noto font site.

2. Unzip the files and copy to the default OpenType folder.
$ mv *.otf /usr/share/fonts/truetype.

3. Update the font cache.
$ sudo fc-cache -f

As you can see from the captured screenshots for both Chrome and Iceweasel/Firefox. Text are fuzzy, due to anti-aliasing, and hard to read using the default font size. Readability only improves until you've zoom it to 150%. Iceweasel/Firefox fair worse than Chrome as the font rasterization kind of messed up with a mixed of aliased and anti-aliasing text.

Feeling disengaged? Burned-out or bored-out?

Three things to do.

First, take a sabbatical leave. That's the common sentiment in the forum. Away from Internet, from any electronic gadgets, and back to the nature. Letting go the fear of missing out.

Second, look into Maslow's hierarchy of needs. Ask yourself honestly, seriously, don't bullshit yourself, right now, about your current needs. Is it physiological, safety, love/belonging, esteem, or self-actualization?

Lastly, pick and plan your next step. Don't repeat yourself, do something different. Try a different domain. Follow up with your childhood dreams or items in your bucket list.

On Django Grappelli

When adding any new framework or library to your development, you'll eventually encounter the 80/20 rule in software development. In which you've finish the 80% of the work but stuck almost infinitely with the remaining 20%.

Several things I learned the hard way about Django Grappelli.

Sequence of loading the INSTALLED_APPS is very important. The Grappelli module must comes before django.contrib.admin module. Failing to do so and the changes to the admin layout will not take effect.

If you want to customize the layout and use the default CSS styling, read the documentation on templates. Unfortunately, not googlable and must be access locally though your Django instance at http://localhost:8000/grappelli/grp-doc/. Oh boy, so much time wasted on googling for the tutorial or documentation on customization.

Where is the bloody documentation on nav-global block?