OSGalaxy

published by flameeyes@gmail.com (Diego E. "Flameeyes" Pettenò) on 2009-02-25 10:35:00 in the "English" category

You might have noticed in the past months a series of issues with my presence on Planet Gentoo. Sometimes posts didn’t appear for a few days, then there have been issues with entries figuratively posted in the future, and a couple of planet spam really made my posts quite obnoxious to many. I didn’t like it either, seems like I had some problems with Typo when moved to Apache from lighttpd, and then there has been issues with Planet and its handling of Atom feeds and similar. Now these problems should be solved, Planet has moved to Venus software, and it now uses the Atom feeds again which are much more easily updated.

But this is not my topic today, today I wish to write about how you can really mess it up with XML technologies. Yesterday I wanted to prepare a feed for the news on the xine’s website so that it could be shown on Ohloh too . Since the idea is to use static content, I wanted to generate the feed, with XSLT, starting from the same data use to generate the news page . Not too difficult actually, I do something similar for my website as well .

But, since my website only needs to sort-of work, while the xine site needs to actually be usable, I decided to validate the generated content using the W3C validator ; the results were quite bad. Indeed, the content in the RSS feed needs to be escaped or just plain text, no raw XHTML is allowed.

So I turned to check Atom, which is supposedly better at things, and is being used for a lot of other stuff as well already. That really looks like XML technology for once, using the things that actually make it work nicely: namespaces. But if I look at my blog’s feed I do see a very complex XML file. I tried giving up on it for a while and gone back to RSS, but while the feed is simple around the entries, the entries themselves are quite a bit to deal with, especially since they require the RFC822 date format which is not really the nicest thing to deal with (for once, it expects days names and month names in English, and it’s far from easily parsed by a machine to translate in a generic date that can be translated in the feed’s user’s locale).

I reverted to Atom, created a new ebuild for the Atom schema for nxml (which by the way fail at allowing auto-completion in XSL files, I need to contact someone about that), and started looking at what is strictly needed. The result is a very clean feed which should work just fine for everybody. The code, as usual, is available on the repository .

As soon as I have time I’ll look into switching my website to also provide an Atom feed rather than an RSS feed. I’m also considering the idea of redirecting the requests for the RSS feed on my blog to Atom, if nobody gives me a good reason to keep RSS. I have already hidden them from the syndication links on the right, which now only present Atom feeds, and they are already the most requested compared to the RSS versions. For the ones who can’t see why I’d like to standardise on a single format: I don’t like redundancy where it’s not needed, and in particular, if there is no practical need to keep both, I can reduce the amount of work done by Typo by just hiding the RSS feeds and redirecting them from within Apache rather than keeping them to hit the application. Considering that typo creates feeds for each one of the tags, categories and posts (the latter I already hide and redirect to the main feed, since they make no sense to me), it’s a huge amount of requests that would be merged.

So if somebody has reasons for which the RSS feeds should be kept around, please speak now. Thanks!



> Read More... | Digg This!

published by flameeyes@gmail.com (Diego E. "Flameeyes" Pettenò) on 2009-02-24 16:22:00 in the "English" category

So I ordered a pair of in-ear phones last Sunday, to make sure I could sleep decently at least once in a while. Since my last pair, which was Sennheiser brand, broken in less than an year (even though I did abuse them quite a bit with the various staying at the hospital), I decided to go with a brand suggested to me by a friend, Shure.

I couldn’t find any Shure hardware in the store near me, so I ordered them through the Apple Store, which works quite well especially when it comes to rapid delivery, indeed they arrived this afternoon. I opened the package, and started inspecting the box. It’s not the usual plastic blister, which is a nice change, especially for the pricetag of the thing, but when I felt around it to check labels (it’s something I do almost without thinking nowadays), I noticed something quite scary:

Warning label

This is the warning label on my new Shure SE210 earphones' box. If you cannot read it clearly it says: "This product contains a chemical known to the State of California to cause cancer and birth defects or other reproductive harm."

No clue about which chemical it is, if it is dangerous to be exposed to it by ingesting it, drinking it, or just using the earphones, or any other specifics. I called the Apple Store just to confirm the thing has the RoHS certification and they assured me it has. But still.



> Read More... | Digg This!

published by flameeyes@gmail.com (Diego E. "Flameeyes" Pettenò) on 2009-02-24 12:11:00 in the "English" category

The compiler warnings are one of the most important features of modern optimising compilers. They replaced, for the good part, the use of the lint tool to identify possible errors in source code. Ignoring too many warnings usually mean that the software hasn’t been looked at and properly polished before release. More importantly, warnings can be promoted to errors with time, since often they show obsolete code structure or unstable code. An example of this is the warning about the size mismatch between integers and pointers, which became an error with the GCC 3.4 release, to make sure that code would run smoothly on 64-bit machines (which started to become much more common).

Usually, a new batch of warnings is added with each minor or major GCC release, but for some reason the latest micro release (GCC 4.3.3) enabled a few more of them by default. This is usually a good sign since a stricter compiler usually mean that code improves with it. Unfortunately I’m afraid the GCC developers have been a little too picky this time around.

You might remember how the most important warnings cannot become errors even if support for turning particular warnings in errors is present with recent GCC releases. But before that I also had some other problem with GCC warnings, in particular warnings against system headers .

That latter problem was quite nasty since by default GCC would hide from me cases where my code defined some preprocessor macro that the system headers were also going to define (maybe with different values). For that reason, I started building xine-lib with -Wsystem-headers to make sure that these case could be handled. It wasn’t much of a problem since the system headers were always quite clean after all, without throwing around warnings and similar. But this is no longer the case, with GCC 4.3.3 I started to get a much lower signal-to-noise ratio, since a lot of internal system headers started throwing warnings like “no previous prototype”. This is not good and really shows how glibc and gcc aren’t really well synchronised (similarly, readline and gdb, but that’s a story for another day).

So okay I disabled -Wsystem-headers since it produces too much noise and went looking for the remaining issues. The most common problem in xine-lib that the new gcc shows is that all the calls to asprintf() ignore the return value. This is technically wrong, but it should lead to little problems there since a negative value for asprintf mean that there is not enough memory to allocate the string, and thus most likely the program would like to abort. As it is, I’m not going to give them too much weight but rather remind myself I should branch xine-lib-1.2 and port it to glib .

Unfortunately, while the “warn about unused return value” idea is quite good for many functions, it is not for all of them. And somehow the new compiler also ignores the old workaround to shut the warning up (that would be a cast to void of the value returned by the function); while, again, technically good to have those warnings, sometimes you just don’t care whether a call succeeded or not because either way you’re going to proceed in the same way, thus you don’t spend time checking the value (usually, because you check it somehow later on). What the “warn about unused return value” should really point out is if there are leaks due to allocation functions whose return value (the address of the newly-allocated memory area) is ignored, since there is no way you can just ignore that without having an error.

One quite stupid place where I have seen the new compiler to throw a totally useless warning is related to the nice() system call; this is a piece of code from xine-lib:

#ifndef WIN32
  /* nice(-value) will fail silently for normal users.
   * however when running as root this may provide smoother
   * playback. follow the link for more information:
   * http://cambuca.ldhs.cetuc.puc-rio.br/~miguel/multimedia_sim/
   */
  nice(-1);
#endif /* WIN32 */

(don’t get me started with the problems related to this function call, it’s not what I’m concerned with right now).

As you can see there is a comment about failing silently, which is exactly what the code wants; use it if it works, don’t if it doesn’t. Automagic, maybe, but I don’t see a problem with that. But with the new compiler this throws a warning because the return value is not checked. So it’s just a matter to check it and eventually log a warning so to make the compiler happy, no? It would be if it wasn’t for the special case of the nice() return value.

On success, the new nice value is returned (but see NOTES below). On error, -1 is returned, and errno is set appropriately.

[snip]

NOTES

[snip]

Since glibc 2.2.4, nice() is implemented as a library function that calls getpriority(2) to obtain the new nice value to be returned to the caller. With this implementation, a successful call can legitimately return -1. To reliably detect an error, set errno to 0 before the call, and check its value when nice() returns -1.

So basically the code would have to morph in something like the following:

#ifndef WIN32
  /* nice(-value) will fail silently for normal users.
   * however when running as root this may provide smoother
   * playback. follow the link for more information:
   * http://cambuca.ldhs.cetuc.puc-rio.br/~miguel/multimedia_sim/
   */
  errno = 0;
  res = nice(-1);
  if ( res == -1 && errno != 0 )
    lprintf("nice failed");
#endif /* WIN32 */

And just for the sake of argument, the only error that may come out of the nice() function is a permission error. While it certainly isn’t an enormous amount of code needed for the check, it really is superfluous for software that is interested in just making use of it if they have the capability to do so. And it becomes even more critical to not bother with this when you consider that xine is a multithreaded program, and that the errno interface is … well… let’s just say it’s not the nicest way to deal with errors in multithreaded software.

What’s the bottom line of this post? Well, I think that the new warnings added with GCC 4.3.3 aren’t bad per-se, they actually are quite useful, but just dumping a whole load of new warnings on the developers is not going to help, especially if there are so many situations where you would just be adding error messages in the application that might never be read by the user. The amount of warnings raised by a compiler should never be so high that the output is filled with them, otherwise the good warnings will just get ignored causing software to not improve.

I think for xine-lib I’ll check out some of the warnings (I at least found some scary code in one situation) and then I’ll disable locally the warn-unused-return warning, or sed out the asprintf() and nice() return values ignoring. In either case I’m going to have to remove some true positive to avoid having to deal with a load of false positives I shouldn’t be caring about. Not nice.



> Read More... | Digg This!

published by flameeyes@gmail.com (Diego E. "Flameeyes" Pettenò) on 2009-02-23 12:03:00 in the "English" category

I’m writing this entry while I’m waiting for my pasta to get ready, in the lunch break I’m having between finishing some job task. For a series of reasons I’m going slowly at it because life in the past few days have been difficult. Adding to my problem with sleep, my neighbours resumed waking me up “early” in the morning (early by my standards, that is). Yes I know that for most people, 11am is not “early”, but given that I always tended to work during the night (easier not to be disturbed by family, friends, Windows users, ...), and that they know that (not adding the fact that my father works shifts so he also works nights quite often), I wouldn’t expect such amount of noise.

With “such amount of noise” I mean that I get woken up, while sleeping with my iPod playing podcasts, the headphones in, with my room’s door closed. And no I don’t have light sleep, once I get to to sleep, unless I know I have to wake up (either to receive a parcel, to go to work remotely, or whatever). This weekend I ended up sleeping just shy of six hours per night, which is not good for my health either. I have now ordered a pair of noise-isolating in-ear phones, they should arrive tomorrow via UPS, they at least tend to be quite on time. On the other hand the Italian postal service, that should deliver me three Amazon packages, takes the usual eternity.

For what most of my readers are concerned, I still have my tinderbox running, after a few tweaks. First of all, Zac provided me a new revision of the script that generates the list of packages that need to be upgraded (or installed) in the system, which takes good care of USE dependencies so that when they are expressed I can see them before the system tries to merge them in. Then I’m wondering about the order of merge. Since I noticed that more than a couple of time I had to suspend the tinderbox run in the midst of it, the lower-end of the list of packages tended to not be merged as often as the upper-end (where quite a few packages I know still fail to merge). This time I executed the tinderbox from the bottom up (more useful since the sys-* packages are lower), but I’m considering next time to just sort the list randomly before feeding it to the merge command so that there are better chances for all the packages to be built in a couple of iterations.

Speaking about tinderbox and packages, I noticed that there are lots of packages that waste time without good reason, doing useless processing. This includes, for instance, compressing the man pages before Portage does so. While one can understand that upstream would like to provide the complete features to the users, it’s also usually a task that distributions do pick up, and would make sense to provide an option “I’m a packager” to not execute them.

Now, you could argue that those are very small tasks, and even for packages installing ten or twenty man pages it doesn’t feel like too much time wasted. But please think of two things first of all: the compression often enough is done with a for loop in sh rather than with multiple Make rules (which would be ran in parallel), serialising the build, and taking more time on multi-core systems. Secondarily, the man pages will have to be decompressed and compressed again by Portage, so it’s about three time the work strictly needed.

Another problem is with packages that, not knowing where to find an interpreter, be it Perl, Ruby, Python or whatever else, check for it in configure or with other methods and then replace it in all their scripts. And once again quite often using for in sh rather than Make rules. Instead of doing that they should just use /usr/bin/env $interpreter to let the system find the right one, and not hardcode it in the files at all (unless you need a specific version, but that’s another point altogether).

Well, I’ve eaten my pasta now (bigoli with pesto, for those interested), so I’ll get a coffee and be back to work. I’ll try to write a more useful blog later on today after I’m done with work. I have lots of things to write about, included something about XBMC (from an user standpoint, I don’t have time to approach it with the packager’s eye).



> Read More... | Digg This!

published by flameeyes@gmail.com (Diego E. "Flameeyes" Pettenò) on 2009-02-21 13:43:00 in the "English" category

Although I admit it’s tempting, I’m not going to enter the mess of complaints (warranted or not) about GIT that have found place on Planet GNOME. I don’t intend to go down on what my issues are with bzr either, since I think I exposed them already . I’m going to comment on a technical issue I have with Mercurial, and show why I find GIT more useful, at least in that case.

If you remember xine moved to Mercurial almost two years ago. The choice of Mercurial at the time was pushed because it seemed much more stable (git indeed had a few big changes since then), it was already being used for gxine, and it had better multi-platform support (running git on Solaris at the time was a problem, for instance). While I don’t think it’s (yet) the time to reconsider, especially since I haven’t been active in xine development for so long that my opinion wouldn’t matter, I’d like to share some insight about the problems I have with Mercurial, or at least with the Mercurial version that Alioth is using.

Let’s not start with the fact that hg does not seem to play too well with permissions, and the fact that we have a script to fix them on Alioth to make sure that we can all push to the newly created repositories. So if you think that setting up a remote GIT repository is hard, please try doing so with Mercurial, without screwing permissions up.

For what concerns command line interface, I agree that hg follows more the principle of least surprise, and indeed has an interface much more similar to CVS/SVN than git has. On the other hand, it requires quite a bit of wondering around to do stuff like git rebase, and it requires enabling extensions that are not enabled by default, for whatever reason.

The main problem I got with HG, though, is with the lack of named branches. I know that the newer versions should support them but I have been unable to find documentation about them, and anyway Alioth is not updated so it does not matter yet. With the lack of named branches, you basically have one repository per branch; while easier to deal with multiple build directories, it becomes quite space-hungry since the reflog is not shared between these repositories, while it is in git (if you clone one linux-2.6 repository, then decide you need a branch from another developer, you just add that remote and fetch it, and it’ll download the minimum amount of changesets needed to fill in the history, not a whole copy of the repository).

It also makes it much more cumbersome to create a scratch branch before doing more work (even more so because you lack a single-command rebase and you need to update, transplant and strip each time), which is why sometimes Darren kicked me for pushing changes that were still work in progress.

In git, since the changesets are shared between branches, a branch is quite cheap and you can branch N times without almost feeling it, with Hg, it’s not that simple. Indeed, now that I’m working at a git mirror for xine repositories I can show you some interesting data:

flame@midas /var/lib/git/xine/xine-lib.git $ git branch -lv
  1.2/audio-out-conversion   aafcaa5 Merge from 1.2 main branch.
  1.2/buildtime-cpudetection d2cc5a1 Complete deinterlacers port.
  1.2/macosx                 e373206 Merge from xine-lib-1.2
  1.2/newdvdnav              e58483c Update version info for libdvdnav.
* master                     19ff012 "No newline at end of file" fixes.
  xine-lib-1.2               e9a9058 Merge from 1.1.
flame@midas /var/lib/git/xine/xine-lib.git $ du -sh .
34M    .

flame@midas ~/repos/xine $ ls -ld xine-lib*   
drwxr-xr-x 12 flame flame 4096 Feb 21 12:01 xine-lib
drwxr-xr-x 13 flame flame 4096 Feb 21 12:19 xine-lib-1.2
drwxr-xr-x 13 flame flame 4096 Feb 21 13:00 xine-lib-1.2-audio-out-conversion
drwxr-xr-x 13 flame flame 4096 Feb 21 13:11 xine-lib-1.2-buildtime-cpudetection
drwxr-xr-x 13 flame flame 4096 Feb 21 13:12 xine-lib-1.2-macosx
drwxr-xr-x 12 flame flame 4096 Feb 21 13:28 xine-lib-1.2-mpz
drwxr-xr-x 13 flame flame 4096 Feb 21 13:30 xine-lib-1.2-newdvdnav
drwxr-xr-x 13 flame flame 4096 Feb 21 13:50 xine-lib-1.2-plugins-changes
drwxr-xr-x 12 flame flame 4096 Feb 21 12:53 xine-lib-gapless
drwxr-xr-x 12 flame flame 4096 Feb 21 13:56 xine-lib-mpeg2new
flame@midas ~/repos/xine $ du -csh xine-lib* | grep total
805M    total
flame@midas ~/repos/xine $ du -csh xine-lib xine-lib-1.2 xine-lib-1.2-audio-out-conversion xine-lib-1.2-buildtime-cpudetection xine-lib-1.2-macosx xine-lib-1.2-newdvdnav  | grep total
509M    total

As you might guess the ~/repos/xine content are the Mercurial repositories. You can see the size difference between the two SCMs. Sincerely, even though I have tons of space, on the server I’d rather keep git rather than Mercurial.

If some Mercurial wizard knows how to work around this issue I got with Mercurial, I might consider it again, otherwise for the future it’ll always be git for me.



> Read More... | Digg This!

published by flameeyes@gmail.com (Diego E. "Flameeyes" Pettenò) on 2009-02-19 18:49:00 in the "English" category

There is one particular topic that was in my TODO list of things to write about in the “For A Parallel World” series, and that topic is recursive make, the most common form of build system in use in this world.

The problems with recursive make has been exposed since at least 1997 by the almost famous Recursive Make Considered Harmful paper. I suggest the reading of the paper to everybody who’s interested in the problems of parallelisation of build systems.

As it turns out, automake supports non-recursive forms quite well, and indeed I use it on at least one project of mine . Lennart also uses it in PulseAudio, as all the modules are built in the same Makefile.am file even though their sources (even the generated ones) are split among different directories.

Unfortunately the paper has to be taken with a grain of salt. The reason why I’m saying that is that if you read it there are at least a couple of places where the author seems to misknown his make rules and defines a rule with two output files, and one where he ignores temporary files naming problems .

There are, of course, other solutions to this problem, for instance I remember the make command for FreeBSD to be able to run into directories in parallel just fine, but I sincerely think here we have to stop one particular problem first. I don’t have too many problems with recursive make for different binaries or final libraries, or for libraries that are shared among targets. Yes they tend to put more serialisation into the process but it’s not tremendously bad, especially not for the non-parallel case where the problem does not seem to appear for most users at all.

What I think is a problem, and I seriously detest it, is when the only reason to use recursive make is to keep the layout of the built object files the same as the source files, and then create sub-directories for logical parts of the same target. With automake this tends to require the creation of convenience noinst libraries that are built and linked against, but never installed. While this works, it tends to increase tremendously the complexity of the whole build, and the time required to build them, since sometimes they get compiled not only into archive file (.a static libraries) but also in final ELF shared objects, depending on their use. Since we know that linking is slow we should try to avoid doing it for no good reason, don’t you think?

In general, the noinst_LTLIBRARIES presence means that either you’re grouping sources that will be used just one in a recursive make system, or you’re creating internal convenience libraries which might even be more evil than that, since it can create two huge binaries like the case of Inkscape.

Once again, if you want your build system reviewed, feel free to drop me an email, depending on how much time I’ve got I’ll do my best to point out eventual fixes, or actually fix it.



> Read More... | Digg This!

published by flameeyes@gmail.com (Diego E. "Flameeyes" Pettenò) on 2009-02-16 02:19:00 in the "English" category

You might remember that some time ago I proposed blocking old user agents while I wasn’t able to get around implementing this idea Typo-side, providing proper warning and interface to the users, the Apache move that followed that allowed me to implement my idea for real using mod_security.

While I think the default ruleset in mod_security is quite anal-retentive and disallows me to post most of my technical blogs (and related comments) by disallowing posting strings like /etc, the thing is tremendously powerful. I’m (ab)using it to stop requests hitting Typo for PHP pages (the server is not going to use PHP any time soon), which together with mod_rewrite reduce the load on the server itself.

To implement my idea (which is actually live on this blog for quite a while and refined further today), I first observed the behaviour of most spam comments, it turned out that I could identify some common patterns which really made it easy to write some rules. While they cannot remove the whole spam, they have a near-zero false positive percentage and it was able to increased the signal to noises ratio to the point I was able to restore comments on all the thousand (actually, nearly thousand, but that’s good enough for me), posts on this blog, spanning about three years of my Gentoo and Free Software work. Before, I had to stretch it to be able to keep them enabled for posts older than 45 days, and it was difficult to manage.

Anyway the first point to make is that only the comment posting should be blocked. I don’t care about the spammers browsing my blog, at the worst they would poison my AWStats output, but that’s password protected and will not cause Google spam. So I wrote all the SecRule entries directly in the virtual host definition inside a LocationMatch block. This should also reduce the per-request work that Apache and the module have to do.

Now, as for the actual rules, I first decided to disallow postings for blatantly too old browsers, like the ones describing themselves like Mozilla/1 to Mozilla/3 or Firefox/0 and Firefox/1 (beside, didn’t Firefox change name after release 1?):

SecRule REQUEST_HEADERS:User-Agent "(mozilla/[123])|(firefox/[01])" 
    "log,auditlog,msg:'User-Agent too old to be true, posting spam comments.',deny,status:403"

Then I started removing “strange and fake” User-Agents, like the ones reporting a Mozilla type with a non-zero decimal value, and then User-Agents which included a certain spyware .

SecRule REQUEST_HEADERS:User-Agent "(mozilla/[45].[1-9]|FunWebProducts)" 
    "log,auditlog,msg:'User-Agent sounds fake, posting spam comments.',deny,status:403"

I sincerely wonder how much false positives the above rule produces, none on my blog but maybe on more Windows-focused blogs it might not work that well. I’m not sure whether the spyware on the system cause IE to be hijacked to produce spam comments, or if the spam comments just appear to use the same User-Agent, but on the whole I guess an user that browses with such software is an user I don’t really want to hear comments from.

Together with that spyware there seem to be more (jeez, do people on Windows really install any crap sent their way? I’m glad I’m using Linux and OSX!), again I’m not sure whether they use generated User-Agents that include them, if they hijack the browser directly from them, or whether systems that already have those kind of spyware are more likely subject to other kind of spyware too.

The next rule kills a lot more spam bots and more spyware-full browsers, by removing any User-Agent with an URL in it. I haven’t found any legit User-Agent that lists an URL, at least not for browsers. Crawlers do, but they don’t post comments.

# Bots usually provide an http:// address to look up their
# description, but those don't usually post comments. Consider any
# comment coming from a similar User-Agent as spam.
SecRule REQUEST_HEADERS:User-Agent "http://" 
    "log,auditlog,msg:'User-Agent spamming URLs, posting spam comments.',deny,status:403"

Then I noticed a huge amount of spam comments coming with HTTP version 1.0, but with User-Agent of browsers that well support HTTP/1.1 and which I’m sure request pages with that version. The only browser I could find that legitimately uses HTTP/1.0 to post comments is lynx, so I whitelisted it explicitly:

SecRule REQUEST_PROTOCOL "!^http/1.1$" 
    "log,auditlog,msg:'Host has to be used but HTTP/1.0, posting spam comments.',deny,status:403,chain"
SecRule REQUEST_HEADERS:User-Agent "!lynx"

The next observation shown that a lot of User-Agents used to post comments had a common error in them: space was URL-encoded, not with the usual %20, but with +, as sometimes it’s done. So I decided to kill those at once again:

SecRule REQUEST_HEADERS:User-Agent 
    "^mozilla/4.0+" "log,auditlog,msg:'Spaces converted to + symbols, posting spam comments.',deny,status:403"

This already reduced a huge amount of the spam, and I used it till today. Then after one more month of observation I found that a lot of spam, and no good comment, came from old default browsers on Windows, or at least pretended to. This included IE6 under Windows XP and IE5 under Windows 2000. So I decided to disallow all the posts from the first case (I’m expecting Windows XP users to get a decent browser, or if they cannot, get at least IE7), and then all the older versions of Internet Explorer, from 2 (yes sometimes it still hits!) to 5:


# We expect Windows XP users to upgrade at least to IE7. Or use
# Firefox (even better) or Safari, or Opera, ...
#
# All the comments coming from the old default OS browser have a high
# chance of being spam, so reject them.
SecRule REQUEST_HEADERS:User-Agent "msie 6.0; windows nt 5.1" 
    "log,msg:'IE6 on Windows XP, posting spam comments.',deny,status:403"

# Also ignore comments coming from IE 5 or earlier since we don't care
# about such old browsers. Note that Yahoo feed fetcher reports itself
# as MSIE 5.5 for no good reason, but I don't care since it cannot
# post comments anyway.
SecRule REQUEST_HEADERS:User-Agent "msie [12345]" 
    "log,msg:'.',deny,status:403"

Now, describing these rules can be a bit controversial. Since making them public also means that the developers of spam bots can now learn some more things to avoid, but I decided to do it anyway for a few reasons I deem good enough.

The first is that I’m sure that a lot of spam bot users don’t care to update their code at all, and rely on the simple sheer amount of posting. Anybody with minimum amount of knowledge of the web can figure out how to reduce the difference between the used User-Agents and the ones that are actually used by users. Then there is the hope that knowing these problems can help someone else reducing the amount of spam just as well.

Finally, today Reinhard and Darren, when discussing about the new xine website, brought up the bus factor which in my case actually morphs to the pancreas factor. It is actually true that, given my past two years, I could disappear, literally dead, without notice. While thinking of this actually depresses me to a point where I wish I never worked in Free Software, I need to work around the problem, by documenting processes and so on.

In the next week, given I don’t have job-related tasks to direct my attention towards, I’ll try to document all the scripts used for the site generation, the configuration files for Apache, the cron jobs regenerating the script and so on so forth. It’s going to be a massive amount of documentation I have to write, but I have been doing that for Gentoo-related stuff for a while already.

Sigh now I really wish I never embarked in this quest to begin with.



> Read More... | Digg This!

published by flameeyes@gmail.com (Diego E. "Flameeyes" Pettenò) on 2009-02-13 19:26:00 in the "English" category

After spending a day working on the new website I’ve been able to identify some of the problems and design some solutions that should produce a good enough results.

The first problem is that the original site did not only use PHP and a database, but also misused them a lot. The usual way to use PHP to avoid duplicating the style is usually to get a generic skin template, and then use one or more scripts per page that gets included in the main one depending on parameters. This usually results in a mostly-working site that, while doing lots of work for nothing, still does not hog down the server with unneeded work.

In the case of xine’s site, the whole thing loaded either a static html page that gets included or a piece of php code that would define variables, which, care of the main script, would then be replaced in a generic skin template. The menu would also not be written once, but generated on the fly for each page request. And almost all the internal links in the pages would be generated by a function call. Adding the padding to the left side of the menu entries for sub-pages was done by creating (with PHP functions) a small table before the image and text that formed the menu link. In addition to all this, the SourceForge logo was cycling on a per-second basis, which meant that an user browsing the site would load about six different SourceForge images in the cache, and that no two request would have got the same page.

The download, release, snapshots and security pages loaded the data on the fly from a series of flat files that contained some metadata about them, and that then produced the output you’d have seen. And to add client-side timewaste to what was already a timewaste on the server side, the changes in shade of the left-handed menu were done using JavaScript rather than the standard CSS2 :hover option.

Probably because of the bad way the PHP code was written, the site had all the crawlers stopped by robots.txt, which is a huge setback for a site aiming to be public. Indeed, you cannot find it on Google’s cache system because of that, which meant that for last night I had to work with the WayBack machine to see how the site appeared earlier. And it was from one year ago, not what we had a few weeks ago. (This has since stopped being a problem since Darren gave me a static snapshot of the thing as seen on his system).

To solve these problems I decided a few things for the new design. First of all as I’ve already said it has to be entirely static after modification, so that the files served are just the same for each request. This includes removing visit counters (who cares nowadays, really), and the changing SourceForge logo. This ensures that crawlers and users alike will see the exact same content over time if it doesn’t change, keeping caches happy.

Also, all the pages will have to hide their extensions, which mean that I don’t have to care whether the page is .htm, .html or .xhtml. Just like my site all the extensions will be hidden so even the switch to a different technology will not invalidate the links. Again this is for search engine and users alike.

The whole generation is done with standard XSLT, without implementation-dependent features, which means it’ll work with libxslt just like with Saxon or anything else. Although I’m going to use libxslt for now since that’s what I’m using for my site as well. By using standard technologies it’s possible to reuse them for the future without relying on versions of libraries and similar. And thanks to the way XSLT has been designed, it’s very easy to decouple the content from the style, which is exactly what a good site should do to be maintainable for a long time.

Since I dislike custom solutions, I’ve been trying very hard to avoid using custom elements and custom templates outside the main skin, the idea is that XHTML usually works by itself, and adding a proper CSS will take care of most of the remaining stuff. This isn’t too difficult after you get around the problem that the original design was entirely based upon tables rather than proper div elements, but the whole thing has been manageable.

Besides, with this method adding a dynamically-generated (but statically-served) sitemap is also quite trivial, since it’s just a different stylesheet applied over the same general data for the rest of the site.

Right now I’m still working on fixing up the security page, but the temporary not-yet-totally-live site is available for testing and the repository is also accessible to see the code if you wish to see how it’s actually implemented. I’ve actually made some sophistication to the xine site I didn’t use for my own, but that will come with time.

The site does not yet validate properly, but the idea is that it will once it’s up, I “just” need to get rid of the remaining usage of tables.



> Read More... | Digg This!

published by flameeyes@gmail.com (Diego E. "Flameeyes" Pettenò) on 2009-02-13 17:06:00 in the "English" category

I’m not much of a web development person. I try to keep my fiddling with my site to a minimum and I focus most of my writing on this blog so that it’s all kept at the same place. I also try not to customise my blog too much beside not having it appear like any other Typo-based blog (the theme is actually mostly custom). For the design of both the site and the blog I relied on OSWD and adapted the designs found there.

I also tend to not care about webservices, webapplications and all that related stuff, it’s out of my sphere, I also try not to comment about web-centric news since I sincerely don’t care. But unfortunately, like most developers out there, I often get inquired about possibilities with webapplications and sites development and so on so forth.

For this reason, I came to be quite opinionated, and probably against the majority of the components who “shape” the net as it is now.

One of my opinions is that you shouldn’t use on-request geneated pages for static content which is what most sites do, with CMSs, Wikis, no-comment blogs and stuff like that. The only reasons why I’m using a web application for my blog is that first of all I happen to write entries while I’m on the go, and second I allow user comments, which is what makes it a blog rather than a crappy website. If I didn’t allow comments, I would have no reason to use a webapplication and could probably just do with a system fetching the entries from an email account.

Another opinion is that you shouldn’t reinvent the wheel because it’s cool. I’m sincerely tired of the amount of sites that include social networking features like friendship and similar. I can understand it when it’s the whole idea of the site (Facebook, FriendFeed) but do I care on sites like Anobii ? (on the other hand I’m glad that Ohloh does not have such a feature).

I’ve been asked at least three times about developing a website with social networking features, with friendship and the stuff, and two out of three times, the target of the projects were “making money”. Sure, okay, keep on trying.

Every other site out there has a CMS to manage the news entries, which could also be acceptable when you have a huge archive and the ability to search through it, but do I need to know which hour it is right now? I have a computer in front of me, I can check it on that (unless of couse I’m looking to find out if it’s actually correctly synchronised). Does every news or group site have to have a photo gallery with its own software on it? There are things like Picasa and Flickr too.

But one thing I sincerely loathe is all the sites that are up with Trac or MediaWiki to provide some bit of content that rarely needs to be edited. Even FreeDesktop.org site is basically a big huge wiki with the developers having write access. Why, I don’t know since you can easily make the thing use DocBook and process the files with a custom stylesheet to produce the pages shown to the user. It’s not like this is overly complex. Especially when just a subset of the people browsing the site have access to edit it.

Similarly, I still wonder why every other WordPress blog requires me to register to the main WordPress site to leave comments. I can understand Blogger and LiveJournal requiring a login either with them or OpenID (and I use my Flickr/Yahoo OpenID for that) but why should I do that on a per-site basis, repeatedly?

But even counting that in, I’m tired of the amount of sites that just duplicate information. Why was xine’s site having its own “security advisory” kind of thing? It’s not like we’re a distribution. Thankfully, Darren started just using the assigned CVE numbers since a few years ago so there is no further explosion of pages. Hopefully, I can cut out some of the pointless content of the site to reduce it.

In the day of I-do-everything sites, I’m really looking forward for smaller, tighter sites that only provides the information they have to instead of duplicating it over and over again. The good web is the light web.



> Read More... | Digg This!

published by flameeyes@gmail.com (Diego E. "Flameeyes" Pettenò) on 2009-02-12 23:26:00 in the "English" category

As it turns out, the usual xine website has gone offline since a few days ago. Since then, Darren set up a temporary page on SourceForge.net servers, and I’ve changed the redirect of xine-project.org which is now sorta live with the same page that there was on SourceForge.net, and the xine-ui skins ready to be downloaded.

Since this situation cannot be left to happen for a lot still, I’ve decided to take up the task to rebuild the site on the new domain I’ve acquired to run the Bugzilla installation. Unfortunately the original site (which is downloadable from the SourceForge repositories) is written in PHP, with use of MySQL for user-polling and news posting, and the whole thing looks like a contraption I don’t really want to run myself. In particular, the site itself is pretty static, the only real use of PHP on it is not having to write boilerplate HTML for each release, but rather write a file describing them, which is something that I’ve used to do myself for my site .

Since having a dynamic website for static content is far from my usual work practises, I’m going to do just what I did for my own website: rewrite it in XML and use XSLT to generate the static pages to be served by the webserver. This sounds complex but it really isn’t, once you know the basic XML and XSLT tricks, which I’ve learnt, unfortunately for me, with time. On an interesting note, when I’ve worked on my most complex PHP project, which was a custom CMS – when CMS weren’t this widespread! – for an Italian gaming website, now dead, I already looked into using XSLT for the themes, but at the time the support for it in PHP was almost never enabled.

I’m still working on it and I don’t count on being able to publish it this week, but hopefully once the site will be up again it’ll be entirely static content. And while I want to keep all the previously-available content, and keep the general design, I’m going to overhaul the markup. The old site is written mostly using tables, with very confused CSS and superfluous spacer elements. It’s not an easy task but I think it’s worth to do it especially since it should be much more usable for mobile users, of which I’m one from time to time.

If I find some interesting technicality while preparing the new website I’m going to write it here, so keep reading if you’re interested.



> Read More... | Digg This!

published by flameeyes@gmail.com (Diego E. "Flameeyes" Pettenò) on 2009-02-11 20:05:00 in the "English" category

So I finally went to ge tth enew disks I ordered, or rather I sent my sister since I’m at home sick again (seems like my health hasn’t recovered fully yet). I ordered two WD SATA disks, two Samsung SATA disks and an external WD MyBook Studio box with two disks, with USB, FireWire and eSATA interfaces. My idea was to vary around the type and brand of disks I use so that I don’t end up having problems when exactly one of them goes crazy, like it happened with Seagate’s recent debacle.

The bad surprise started when I tried to set up the MyBook; I wanted to set it up as RAID1, to store my whole audio/video library (music, podcasts, audiobooks, tv series and so on so forth), then re-use the space that is now filled with the multimedia stuff to store the archive of downloaded software (mostly Windows software, which is what I use to set up Windows systems, something that I unfortunately still do), ISO files (for various Windows versions, LiveCDs and stuff like that), and similar. I noticed right away that contrary to the Iomega disk I had before, this disk does not have a physical hardware switch to enable raid0, raid1 or jbod. I was surprised and a bit appalled, but the whole marketing material suggests that the thing works fine with Mac OS X, so I just connected it to the laptop and looked for the management software (which is inside the disk itself, rather than on a different CD, that’s nice).

Unfortunately once the software was installed, it failed to install itself in the usual place for Applications under OSX, and it also failed to detect the disk itself. So I went online and checked the support site, there was an upate to both the firmware of the drive (which means the thing is quite more complex than I’d expect it to be) and to the management software. Unfortunately, neither solved my issue, so I decided it had to be a problem with Leopard, and thus I could try with my mother’s iBook which is still running Tiger, still no luck. Even installing the “turbo” drivers from WD solved the problem.

Now I’m stuck with a 1TB single-volume disk set which I don’t intend to use that way, I’ll probably ask a friend to lend me a Windows XP system to set it up, and then hope that I’ll never have to use it, but the thing upsets me. Sure from a purely external hardware side it seems quite nice, but the need for software to configure a few parameters, and the fact that there is no software to do so under Linux, really makes the thing ludicrous.



> Read More... | Digg This!

published by flameeyes@gmail.com (Diego E. "Flameeyes" Pettenò) on 2009-02-06 20:25:00 in the "English" category

I’ve been having some nasty problems with Yamato with the two ethernet interfaces disappearing from time to time at boot, which isolated my bedroom and all its equipment, since one of those is bridging that to my wireless network (through use of two eth-over-powerline adapters). When reporting these problem with Tyan (the mainboard manufacturer) they suggested me to replace the PSU and upgrade from 550W to at least 650W. Since I had to order some stuff from my German supplier I decided to add to that order a 750W PSU from be quiet! (the same brand as the old one). It arrived yesterday.

The surprise was that when I brought it out, the connector on the back of the unit was not the standard “kid home-shaped” connector that we’re used to (at least in Europe), the IEC C13/C14 couple but rather the much less common “depressed face-shaped” (thanks to Joshua for the name!) IEC C19/C20 couple, which is rated at 16A 250V rather than the much more usual 10A 240V of the former.

While I still don’t think there is need for such a huge cable (16A at 220V is something more than 3KW and I don’t even have enough power for that in my house), this is a bit of a showstopper since I cannot plug it in my UPS as it is, I’ll have to wire up a converter. Which is not difficult given I already have a C14 lead in my drawers, the problem is that I need a C19-C20 cable that I have no idea where to find in my city. Oh well, I have next week to find it before the new disks which I ordered arrive.

As for the differences in Wikipedia, it’s fun to note that while the English C19 page is a disambiguation page that, among other options, leads to the IEC connector page above, the French C19 page instead decides to be less technical by referring directly to a DragonBall character. Thanks to Gilles for pointing that out.



> Read More... | Digg This!

published by flameeyes@gmail.com (Diego E. "Flameeyes" Pettenò) on 2009-02-03 12:28:00 in the "English" category

Today’s interesting reading is certainly Stormy Peter’s post about hypothetically open-sourcing Windows, while I agree that the conclusion is that Windows is unlikely to get open sourced any time soon, I don’t sincerely agree on other points.

First of all, I get the impression that she’s suggesting that the only reason Linux exists is to be a Free replacement for WIndows, which is certainly not the case; even if Windows were open-source by nature, I’m sure we’d have Linux, and FreeBSD, and NetBSD, and OpenBSD, and so on so forth. The reason for this is that the whole architecture behind the system is different, and is designed to work for different use-cases. Maybe we wouldn’t have the Linux desktop as we know it by now, but I’m not sure of that either. Maybe the only project that would then not have been created, or that could be then absorbed back into Windows, would be ReactOS.

Then there is another problem: confusing Free Software and Open Source. Even if Microsoft open-sourced Windows, adopting the same code would likely not be possible even for projects like Wine and ReactOS that would be able to use it as it is, because the license might well be incompatible with the rest of them.

And by the way, most of the question could probably be answered by looking at how Apple open sourced big chunks of its operating system . While there is probably no point in even trying to get GNU/Darwin to work, the fact that Apple releases code for most of its basic operating system does provide useful insights for stuff like filesystem hacking and even SCSI MMC commands hacking, even just being able to read its sources. It also provides access to the actual software which for instance give you access to the fsck command for HFS+ volumes on Linux (I should update it by the way).

Or if you prefer, at how Sun created OpenSolaris, although one has to argue that in the latter case there is much more similarity with Linux and the rest of *BSD systems that it says very little about how a similar situation with Windows would turn out to be. And in both cases, people still pay for Solaris and Mac OS X.

In general, I think that if Microsoft were to open-source even just bits of its kernel and basic drivers, the main advantages would again come out of filesystem support (relatively, since the filesystems of FreeBSD, Sun Solaris, NetBSD and OpenBSD are really not that well supported by Linux already), and probably some ACPI support that might be lacking in Linux for now. It would be nice, though, if stuff like WMI would then be understandable.

But since we know already that open-sourcing Windows is something that is likely to happen in conjunction with Duke Nukem Forever release, all this is absolutely absurd and should not be thought too much about.



> Read More... | Digg This!

published by flameeyes@gmail.com (Diego E. "Flameeyes" Pettenò) on 2009-01-31 21:35:00 in the "English" category

I’ve already written about my buying hardware from Germany for what concerns most of my non-urgent hardware needs. And with the exclusion of harddrives since they tend to fail on me quite often (by the way, with the cold and everything I didn’t get around to order my new hard disks, I’ll see to do that next week since I’m really needing them). Not only that, but sometimes I get friends asking me if I can fetch them something at a lower price than they’d buy it here.

It obviously has disadvantages: it takes time for the stuff to arrive, it takes even more if something is faulty, but all in all most of my friends are glad they can save some money since they rarely have urgent needs for computers, as they tend to play most of the time with them, rarely they need it for job or school. At any rate, the whole thing is sometimes distressing for me, I got used to shipment delays and similar, which is physiological with this way of buying stuff. On the other hand, it keeps me up to date with consumer hardware which I don’t lately deal with myself, since I currently use Apple laptops and I got higher grade hardware for my workstation.

In 2008 I had to build two computers, for two friends of mine in the same household, and to cut down on the amount of work to do, I choose to use mostly-similar hardware between the two of them. Which consisted mainly in using the same wireless card and the same motherboard. Unfortunately one bought a 32-bit XP Pro license, and the other a 64-bit XP Pro license which caused quite some difference between the installation of the two of them, but it wasn’t that bad. (Yes I know XP is evil, I know it all too well, so I usually provide installation media for Linux as well, but since they need it to play or do 3D rendering, I don’t go syndicate about it; their loss not to use a Free operating system I guess).

When the time came to order the two boxes that I received last week, I decided to use the same botherboard again, a Gigabyte GA-MA770-DS3 which turned out pretty neat for both the other two boxes. Again one friend bought a 32-bit XP and the other 64-bit XP. Although I ordered the same exact product code, the motherboard I received are quite different, hardware revision 2.0 rather than 1.0, with different drivers even for AHCI to the point that it wouldn’t work with the install CDs I prepared for the other two. Okay, no big deal.

But the problems with XP (the huge amount of them) are not what I’m interested in talking about. Instead I’m interested in the way they evolved the hardware on a very surface level between the two revisions. This motherboard has an interestingly huge amount of USB ports which makes it pretty useless to use USB hubs, with the exception of having ports right over your desk rather than behind your computer; before, you had six leads for them internally, two connecting to the front in most computers and one behind on a bracket over a PCI slot or similar.

The new revision instead has four leads internally, and expose the rest of them on the backplate; to do that they removed the serial port, which is now a lead on the motherboard. It reminded me so much of old AT motherboard where the only connection on the backplate was the DIN keyboard connection and you had to use brackets (or the chassis’s cut) for the other connections. The interesting bit is that they don’t give you a bracket with the serial port, nor they give you one with the parallel port, which already on rev 1.0 was not present on the backplate. It’s interesting to note how the old DB-25/DE-9 is disappearing step by step from our computers. Luckily neither of my friends wanted serial or parallel ports, but even if they wanted I think I still have some connectors around.

When I was given my first ‘modern’ computer, a Pentium 133 MHz, in 1997, it was standard to have a gameport/MPU-401 port on board for MIDI and joysticks; since the introduction of joysticks with three axis, and gamepads with eight or more buttons, this started to be quite a problem though. Even more when Sony introduced the DualShock line which had twelve buttons, a D-pad (which means another four buttons) and two analog sticks (which again changed with the coming of the PlayStation 3), and PC gamepad manufacturers had to find ways to do something as good, which turned out to be USB gamepads. Since the MIDI hardware was already not that common at the time, this was one of the first thing to disappear, and software support for it started to bitrot, I remember having to patch my kernel for a while with my Athlon 1000 which had an integrated MPU-401 port, because it failed to initialize it properly. From AC’97 onward, I don’t think I have seen another gameport on computers I fixed, which said “good bye” to one format.

The second port to say bye has been the parallel port, which was used by the old line printers, with their more or less complex children and with its nightmares to work with on Linux. I was never able to control the parallel port decently from Linux because most of the sample code for that was designed to be used with DOS or Windows 95 and used the I/O ports directly, which is not really a good thing. Interestingly enough, the 8255 interface that was used for those was used at my school for the electronics laboratory, although it then had a full 24-bit interface, rather than registry-based interface that EPP ports have. Enterprise had a parallel port, Yamato does not have it.

The third port is the serial port ad I can really guess quite well why they are taking it away, on most modern desktop systems it’s useless, nobody would ever think of making use of it. Even analog modems nowadays use USB interfaces through CDC. Luckily for me, Yamato still has two, one of which is wired to the SBMC board for IPMI, but even this way I have not one but three USB RS-232 adapters connected to Yamato (one of which recently supported by Linux admittedly, only two are connected USB-side, the other is connected serial-port side and I use for connecting to the serial console from the laptop which has no serial port at all.

The only remaining D-subminiature you still find on modern computers is the VGA port, but even that is getting away soon, it is already gone from most laptops, and it’s getting replaced by DVI and HDMI ports. On one of the two computers, the video card has a dual-DVI video card, which means it won’t have any D-sub port on the back of it. Sheesh!

I know this post is mostly useless, but it really made me feel old, I have to say.



> Read More... | Digg This!

published by flameeyes@gmail.com (Diego E. "Flameeyes" Pettenò) on 2009-01-30 15:19:00 in the "English" category

There are a lot of reasons to use autotools over custom buildsystems, most of them relate to the fact that autotools contain reusable generic code that allows to provide users with common ways of doing the same thing among different projects. Almost all of these features are available in a form or another on the majority of buildsystems, but sometimes they get either suboptimally documented, if at all, and they are different project from project which makes it very difficult to deal with them in a common abstract way.

One of these feature is options to enable and disable features and optional dependencies, to avoid automagic dependencies which are a problem for advanced users and from-source distributors like us. While BSD makefiles have their knobs, and most software provides access through make variables or preprocessor macros to enable or disable features and other things, the configure script generated by autoconf provides both a common interface to those options, and a documentation for them.

There are, though, a few interesting notes about this very useful common interface because a lot of projects either misuse it, or don’t know how deep the autoconf boilerplate is, and reinvent parts of it with no good reason. Let me try to show some of the interesting bits about it.

The first thing to know about the two AC_ARG_ENABLE and AC_ARG_WITH macros is that their arguments are: name, description, if present, if not present. The common mistake here is to consider the last two arguments as if enabled and if disabled; I’ve written about that mistake already a few years ago . This is not the case, and thus checking whether the option is enabled or disabled will have to be done outside of the option declaration for completeness.

Another interesting note is that the third parameter, if omitted, will by default generate a $enable_name or $with_name variable with the content of the specified option at configure (defaulting to yes and no for the positive and negative options when an explicit parameter is not passed through). It is thus possible to get a default-enabled feature variable using code like this:

AC_ARG_ENABLE([feature], [...], , [enable_feature=yes])

AS_IF([test "$enable_feature" = "yes"], [...])

Which is very handy since it avoids having to create custom variables and checking for them repeatedly (again, this is already written in my automagic guide ).

In the example above I explicitly skipped writing the documentation of the option itself, since that is another point where a lot of projects have confusion. If you look at the output of ./configure --help, the default options are all well aligned and make use of as much space as it ? available on the terminal window you’re running it into. On the other hand, some projects’ custom options are instead badly aligned, tightened down on a side of the screen, splitted among multiple lines, or going over the horizontal boundary of the terminal. This is because the upstream developers tried to fake the same alignment of autoconf, without knowing that what it does is usually adapting to the actual output of the system.

So for instance you got stuff like this, coming from the gnumeric configure script:

  --enable-compile-warnings=[no/minimum/yes/maximum/error]
                          Turn on compiler warnings
  --enable-iso-c          Try to warn if code is not ISO C
--disable-ssconvert        Do not build ssconvert (command line spreadsheet conversion tool)
--disable-ssindex        Do not build ssindex (spreadsheet indexer for beagle)
--disable-ssgrep        Do not build ssgrep (search for supplied strings in spreadsheet)
--disable-solver  Don't compile the solver
--enable-plugins="text html"  Compile only the listed plugins
  --enable-pdfdocs        Generate documentation in Portable Document Format

As you can see there are multiple lines that are aligned totally to the right, and that go long to the right, while some others try to align themselves, and keep some space to their left too. The reason can be found in the configure.in file:

AC_ARG_ENABLE(ssconvert,
  [--disable-ssconvert          Do not build ssconvert (command line spreadsheet conversion tool)],
  [], [enable_ssconvert=yes])
AM_CONDITIONAL(ENABLE_SSCONVERT, test x"$enable_ssconvert" = xyes)

While this time the third and fourth arguments are correct, there is something up with the second, that is the description of the option. It’s totally expanded in the configure file, spaces included. But since it would be silly to waste space and readability that way, autoconf already provides an easy way to deal with the problem, which is to use the AS_HELP_STRING macro (formerly AC_HELP_STRING):

AC_ARG_ENABLE(ssconvert,
  AS_HELP_STRING([--disable-ssconvert], [Do not build ssconvert (command line spreadsheet conversion tool)]),
  [], [enable_ssconvert=yes])
AM_CONDITIONAL(ENABLE_SSCONVERT, test x"$enable_ssconvert" = xyes)

which then produces:

  --enable-compile-warnings=[no/minimum/yes/maximum/error]
                          Turn on compiler warnings
  --enable-iso-c          Try to warn if code is not ISO C
  --disable-ssconvert     Do not build ssconvert (command line spreadsheet
                          conversion tool)
  --disable-ssindex       Do not build ssindex (spreadsheet indexer for
                          beagle)
  --disable-ssgrep        Do not build ssgrep (search for supplied strings in
                          spreadsheet)
  --disable-solver        Don't compile the solver
  --enable-plugins="text html"
                          Compile only the listed plugins
  --enable-pdfdocs        Generate documentation in Portable Document Format

It looks nicer, doesn’t it?

Hopefully, reminding people about this will allow projects to clean up their configure.ac (or configure.in for those still using the old naming convention), or for their users to submit patches, so that the output is decently formatted and usable, even by automatic systems like zsh’s tab completion of ./configure options.

P.S.: since I’ve changed gnumeric configure.in script, I’m going to submit it upstream now, so no need to get to fix that.



> Read More... | Digg This!