Invalid UTF-8 sequence in Subversion

Encountered this when I tried to update the local working copy. Error message obtained as follow:
$ svn up

svn: Valid UTF-8 data
(hex: 49 50 )
followed by invalid UTF-8 sequence
(hex: a0 2d 56 69)

No idea which file(s) are causing this. Google around for quick solution. Found that you will need to go into every folder and type svn info to trace down the culprit file. Not a good solution.

Second approach. Try to limit down the scope. Using svn log -v to trace back recently committed files.

Google around for answer. Suspect some of the committed files was encoded in different charset. Checked using file -i command as shown. Not helpful at all.
$ file -bi config.php

text/x-c++; charset=us-ascii

Google again. Interesting solution proposed by cooper,
"strace svn status will give you the name of the offending file. unfortunately, svn care about name of files that are in one of its directories, even if it’s not under revision."
Try again using strace, an utility to monitor system calls [4]. Whola, offending file highlighted in bold.
$ strace svn status

fstat(3, {st_mode=S_IFDIR|0777, st_size=278528, ...}) = 0
fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
getdents(3, /* 38 entries */, 4096)     = 4088
write(2, "svn: Valid UTF-8 data\n(hex: 49 5"..., 122svn: Valid UTF-8 data
(hex: 49 50)
followed by invalid UTF-8 sequence
(hex: a0 2d 56 69)
) = 122

It seems the offending files were located in the pdf folder. Remove that folder and try svn status again.

Whoa ! Problem solved. The root of the problem was related to certain PDF files generated with unsupported enconding file names.

Seriously, I fricking love strace!

No comments:

Post a Comment