Even though I did read Donnie’s posts with his proposal on how to better Gentoo, I don’t really want to discuss them at the moment, since I have to admit that Gentoo politics is far from what I want to care right now, I have little time lately and that little time should not be wasted in discussing non-technical stuff, in my opinion, since what I’m best doing is technical stuff. Yet, I wanted to at least discuss one problem I see with most non-technical ideas about how to improve Gentoo: the bug count idea.
Somehow, it seems like management-kind people like to use the bug count of something as a metric to decide whether it’s good or not. I disagree with this ferociously because it really does not say much about software, if we consider as closed bugs that have just been temporarily worked around. It would be like considering the count of lines of code as a metric to evaluate the value of software. Any half-decent software engineer know that the amount of lines of code alone is pointless, and that you should really consider the language (it really changes a lot if the lines can contain one or twenty instructions each), and the comment-to-code ratio.
If the Gentoo developers were evaluted based on how many open bugs there are or on the average time a bug is kept open, we’ll end up with an huge amount of bugs closed for “need info” or “invalid” or “works for me” (which has a vague “you suck” idea), and of course “later” which is one kind of resolution I really loathe. The problem is that sometimes, to reduce the impact of a bug on users, you should put a quick workaround, like a -j1 in the emake call, or you’ll have to wait for something else to happen (for instance you might need to use built_with_use checks until the Portage version that supports EAPI 2 is stable.
Here the problem is that the standard workflow used in Bugzilla does not suite the way Gentoo works at all. And not everybody in Gentoo follows the same policy when it comes to bugs. For instance, many people would find that any problem with the upstream code does not concern Gentoo, others feel like crashes and similar problems should be taken care of, but improvements and other requests should be sent upstream. Some people think that parallel make is a priority (I’m one of them) but others don’t care. All in all, before we decide to measure the performance of developers based on bugs, we should first come over with a strict policy on how bugs are handled.
But still, open bugs mean that we know there are problems, and might or might not mean that we’re actively working on solving them, it might well be that we’re waiting for someone else to take care of them for instance. How should we deal with this?
I sincerely think that we should see to add more states to the bugs state machine. For instance we could make so that there’s a “worked around” state, which would be used for stuff like parallel make being disabled, --as-needed turned off and similar. It would mean that the bug is still there and should be resolved, but in the mean time users shouldn’t be hitting it any longer. Furthermore, we should have a way to take a bug off our radars for a while by setting it “on hold” till another bug is solved. For instance, the new Portage goes stable so we can deal with all the bugs related to EAPI 2 features being needed.
Then we should make sure that bugs that are properly resolved, and confirmed so, are closed, so that we can be sure that they won’t come up again. Of course it can’t be the same person who has marked the bug as resolved to mark it as closed, but someone else, in the same team, or the user reporting the bug, should be able to confirm that the resolution is correct and the bug is definitely closed.
But the most important thing for me is to take away the idea that bugs are a way to measure how software sucks and consider them instead a documentation of what has yet to be done. This is probably why trackers different from Bugzilla often change the name of the entries to “tasks” or “issues”; language has a high psychological impact, and we might want to deal with that ourselves.
So, how many tasks have you completed today?
>
Read More... |
Digg This!
Although most of the software in Portage already follows the correct rule of not putting their plugins in /usr/lib or equivalent directories, during my collision detection analysis I did find a few that instead pollute the standard library path with their plugins.
In general, I see way too much pollution in the library path, and that is quite bad since the loader will have to iterate through all of it whenever it has to find a library that is not in the cache, or if environment variables are set that forces it to look for libraries in a different order. Also, the linker (which is quite slow already) needs to look there to find the libraries to link to, which means more and more work every time, to scan everything.
For this reason, the amount of files in /usr/lib that are not needed to be there should really be reduced drastically, this means for instance not installing shell scripts there, like quite a bit of software does, and especially not installing plugins, since those would be picked up by the ldconfig utility and written in a cachefile that should really be as small as possible.
It’s not just that though, there are more problems with software installing plugins, for instance I noticed that in the PAM modules directory /lib/security there is one plugin with a full-versioned name (that is with the final .0.0 which is not needed by PAM modules) and another that uses the lib prefix (also unneeded); both are likely built with libtool by somebody who knows not how to properly build plugins.
I really should be writing another further test to make sure that the packages get installed following the usual layout, I just don’t know how for now. It would be a nice way to identify “rogue” packages that install outside of the standard-defined directories.
In general, whenever you want to install plugins you should do so by putting them in a separate directory inside /usr/lib, usually pkglibdir when using automake (defaults to /usr/lib/PACKAGE_NAME). This also allows a much simpler approach to plugins loading. And if you’re not really allowing plug-ins to be added after the ones compiled at build time, then you should probably not use plugins but rather build them in.
I guess there is more work for my tinderbox now…
>
Read More... |
Digg This!
When checking today’s failure logs from the tinderbox, to check if there are packages failing with readline 6.0 beside GDB, I’ve noticed a few more failures since last time, mostly related to the new gcc 4.3.3 ebuild and the fact that -D_FORTIFY_SOURCE=2 is now enabled by default. Which by the way is what makes the whole thing too noisy for my taste .
While there are a few cases where the code is explicitly being rejected by the compiler for being wrong, most of the failures seems to be extra warnings that morph into failures because of the use of -Werror in released code. I think I talked about this kind of problem in passing in the past, but never wrote a full entry about it. I think the time has come for that.
The -Werror flag for GCC (used also by ICC and equivalent to Sun’s -errwarn=%all) is often considered useful to make sure one’s code is solid enough to build without warnings. This is good since warnings often enough result in errors, and as I noted yesterday having too many warnings can cause new ones to be ignored and thus create a domino effect to the point the whole software is screwed. But, it’s not a good idea to unconditionally set it up in the released code.
Why do I say that? Because things change, and especially, warnings are added. The fact that a new compiler is stricter and considers a particular piece of code as something to warn about is not going to change the quality of the software per-se; and while it’s true that fixing the warnings early can save from failing further down the road, users often enough just need the thing to build and work when they need it, and would prefer not to have to fix a few warnings first.
So we have two opposite considerations: enabling -Werror can allow the developers and the users interested in the total correctness of the program to identify new warnings earlier, and at the same time the remaining set of users who don’t care about correctness but just want the thing to work would like -Werror disabled. What’s the solution? First of all, learn to use particular -Werror= flags, (see this old post of mine for some information about it), and then you should make the thing optional.
See this is what makes free software quite powerful sometimes, optionality. Just add a switch, a know, a line to comment in and out, so that -Werror is used by default on developers builds but not on the normal user releases. For most non-autoconf-based build systems, -Werror is just passed along with the rest of the CFLAGS so it’s easy to deal with that, for autoconf-based systems, it’s not rare that it’s added at the end of the script, unconditionally. Why does that happen? Because passing it to the ./configure call, like any other compiler flag will almost certainly cause some autoconf checks to fail. No more no less.
True, sometimes what is warning in a version of GCC becomes error in the one after, so it’s not really a solution if the warnings are not taken care of. But that’s the very reason why GCC introduces them as warnings usually! It gives time to the developers to act on them before rejecting them out of the blue. Of course it would be nicer if GCC also added an extra specification like “this will become an error with release X.Y.Z” but still even that is often ignored so it does not really matter.
This becomes even more important for ebuild developers since having -Werror enabled does not really work well with Gentoo, since we might add new stricter GCC versions, or even one more entry in CFLAGS to enable further warnings (for instance I have -Wformat=2 -Wstrict-aliasing=2 -Wno-format-zero-length in mine), which would then cause packages to fail out of the blue. Unfortunately it seems that quite a bit of packages still use -Werror in their default build and not all Gentoo maintainers took care of removing it beforehand.
So please, don’t use -Werror in released code, make it optional, use it during development, but not in released code. And not in ebuilds either.
>
Read More... |
Digg This!
You might have noticed in the past months a series of issues with my presence on Planet Gentoo. Sometimes posts didn’t appear for a few days, then there have been issues with entries figuratively posted in the future, and a couple of planet spam really made my posts quite obnoxious to many. I didn’t like it either, seems like I had some problems with Typo when moved to Apache from lighttpd, and then there has been issues with Planet and its handling of Atom feeds and similar. Now these problems should be solved, Planet has moved to Venus software, and it now uses the Atom feeds again which are much more easily updated.
But this is not my topic today, today I wish to write about how you can really mess it up with XML technologies. Yesterday I wanted to prepare a feed for the news on the xine’s website so that it could be shown on Ohloh too . Since the idea is to use static content, I wanted to generate the feed, with XSLT, starting from the same data use to generate the news page . Not too difficult actually, I do something similar for my website as well .
But, since my website only needs to sort-of work, while the xine site needs to actually be usable, I decided to validate the generated content using the W3C validator ; the results were quite bad. Indeed, the content in the RSS feed needs to be escaped or just plain text, no raw XHTML is allowed.
So I turned to check Atom, which is supposedly better at things, and is being used for a lot of other stuff as well already. That really looks like XML technology for once, using the things that actually make it work nicely: namespaces. But if I look at my blog’s feed I do see a very complex XML file. I tried giving up on it for a while and gone back to RSS, but while the feed is simple around the entries, the entries themselves are quite a bit to deal with, especially since they require the RFC822 date format which is not really the nicest thing to deal with (for once, it expects days names and month names in English, and it’s far from easily parsed by a machine to translate in a generic date that can be translated in the feed’s user’s locale).
I reverted to Atom, created a new ebuild for the Atom schema for nxml (which by the way fail at allowing auto-completion in XSL files, I need to contact someone about that), and started looking at what is strictly needed. The result is a very clean feed which should work just fine for everybody. The code, as usual, is available on the repository .
As soon as I have time I’ll look into switching my website to also provide an Atom feed rather than an RSS feed. I’m also considering the idea of redirecting the requests for the RSS feed on my blog to Atom, if nobody gives me a good reason to keep RSS. I have already hidden them from the syndication links on the right, which now only present Atom feeds, and they are already the most requested compared to the RSS versions. For the ones who can’t see why I’d like to standardise on a single format: I don’t like redundancy where it’s not needed, and in particular, if there is no practical need to keep both, I can reduce the amount of work done by Typo by just hiding the RSS feeds and redirecting them from within Apache rather than keeping them to hit the application. Considering that typo creates feeds for each one of the tags, categories and posts (the latter I already hide and redirect to the main feed, since they make no sense to me), it’s a huge amount of requests that would be merged.
So if somebody has reasons for which the RSS feeds should be kept around, please speak now. Thanks!
>
Read More... |
Digg This!
So I ordered a pair of in-ear phones last Sunday, to make sure I could sleep decently at least once in a while. Since my last pair, which was Sennheiser brand, broken in less than an year (even though I did abuse them quite a bit with the various staying at the hospital), I decided to go with a brand suggested to me by a friend, Shure.
I couldn’t find any Shure hardware in the store near me, so I ordered them through the Apple Store, which works quite well especially when it comes to rapid delivery, indeed they arrived this afternoon. I opened the package, and started inspecting the box. It’s not the usual plastic blister, which is a nice change, especially for the pricetag of the thing, but when I felt around it to check labels (it’s something I do almost without thinking nowadays), I noticed something quite scary:

This is the warning label on my new Shure SE210 earphones' box. If you cannot read it clearly it says:
"This product contains a chemical known to the State of California to cause cancer and birth defects or other reproductive harm."
No clue about which chemical it is, if it is dangerous to be exposed to it by ingesting it, drinking it, or just using the earphones, or any other specifics. I called the Apple Store just to confirm the thing has the RoHS certification and they assured me it has. But still.
>
Read More... |
Digg This!
The compiler warnings are one of the most important features of modern optimising compilers. They replaced, for the good part, the use of the lint tool to identify possible errors in source code. Ignoring too many warnings usually mean that the software hasn’t been looked at and properly polished before release. More importantly, warnings can be promoted to errors with time, since often they show obsolete code structure or unstable code. An example of this is the warning about the size mismatch between integers and pointers, which became an error with the GCC 3.4 release, to make sure that code would run smoothly on 64-bit machines (which started to become much more common).
Usually, a new batch of warnings is added with each minor or major GCC release, but for some reason the latest micro release (GCC 4.3.3) enabled a few more of them by default. This is usually a good sign since a stricter compiler usually mean that code improves with it. Unfortunately I’m afraid the GCC developers have been a little too picky this time around.
You might remember how the most important warnings cannot become errors even if support for turning particular warnings in errors is present with recent GCC releases. But before that I also had some other problem with GCC warnings, in particular warnings against system headers .
That latter problem was quite nasty since by default GCC would hide from me cases where my code defined some preprocessor macro that the system headers were also going to define (maybe with different values). For that reason, I started building xine-lib with -Wsystem-headers to make sure that these case could be handled. It wasn’t much of a problem since the system headers were always quite clean after all, without throwing around warnings and similar. But this is no longer the case, with GCC 4.3.3 I started to get a much lower signal-to-noise ratio, since a lot of internal system headers started throwing warnings like “no previous prototype”. This is not good and really shows how glibc and gcc aren’t really well synchronised (similarly, readline and gdb, but that’s a story for another day).
So okay I disabled -Wsystem-headers since it produces too much noise and went looking for the remaining issues. The most common problem in xine-lib that the new gcc shows is that all the calls to asprintf() ignore the return value. This is technically wrong, but it should lead to little problems there since a negative value for asprintf mean that there is not enough memory to allocate the string, and thus most likely the program would like to abort. As it is, I’m not going to give them too much weight but rather remind myself I should branch xine-lib-1.2 and port it to glib .
Unfortunately, while the “warn about unused return value” idea is quite good for many functions, it is not for all of them. And somehow the new compiler also ignores the old workaround to shut the warning up (that would be a cast to void of the value returned by the function); while, again, technically good to have those warnings, sometimes you just don’t care whether a call succeeded or not because either way you’re going to proceed in the same way, thus you don’t spend time checking the value (usually, because you check it somehow later on). What the “warn about unused return value” should really point out is if there are leaks due to allocation functions whose return value (the address of the newly-allocated memory area) is ignored, since there is no way you can just ignore that without having an error.
One quite stupid place where I have seen the new compiler to throw a totally useless warning is related to the nice() system call; this is a piece of code from xine-lib:
#ifndef WIN32
/* nice(-value) will fail silently for normal users.
* however when running as root this may provide smoother
* playback. follow the link for more information:
* http://cambuca.ldhs.cetuc.puc-rio.br/~miguel/multimedia_sim/
*/
nice(-1);
#endif /* WIN32 */
(don’t get me started with the problems related to this function call, it’s not what I’m concerned with right now).
As you can see there is a comment about failing silently, which is exactly what the code wants; use it if it works, don’t if it doesn’t. Automagic, maybe, but I don’t see a problem with that. But with the new compiler this throws a warning because the return value is not checked. So it’s just a matter to check it and eventually log a warning so to make the compiler happy, no? It would be if it wasn’t for the special case of the nice() return value.
On success, the new nice value is returned (but see NOTES below). On error, -1 is returned, and errno is set appropriately.
[snip]
NOTES
[snip]
Since glibc 2.2.4, nice() is implemented as a library function that calls getpriority(2) to obtain the new nice value to be returned to the caller. With this implementation, a successful call can legitimately return -1. To reliably detect an error, set errno to 0 before the call, and check its value when nice() returns -1.
So basically the code would have to morph in something like the following:
#ifndef WIN32
/* nice(-value) will fail silently for normal users.
* however when running as root this may provide smoother
* playback. follow the link for more information:
* http://cambuca.ldhs.cetuc.puc-rio.br/~miguel/multimedia_sim/
*/
errno = 0;
res = nice(-1);
if ( res == -1 && errno != 0 )
lprintf("nice failed");
#endif /* WIN32 */
And just for the sake of argument, the only error that may come out of the nice() function is a permission error. While it certainly isn’t an enormous amount of code needed for the check, it really is superfluous for software that is interested in just making use of it if they have the capability to do so. And it becomes even more critical to not bother with this when you consider that xine is a multithreaded program, and that the errno interface is … well… let’s just say it’s not the nicest way to deal with errors in multithreaded software.
What’s the bottom line of this post? Well, I think that the new warnings added with GCC 4.3.3 aren’t bad per-se, they actually are quite useful, but just dumping a whole load of new warnings on the developers is not going to help, especially if there are so many situations where you would just be adding error messages in the application that might never be read by the user. The amount of warnings raised by a compiler should never be so high that the output is filled with them, otherwise the good warnings will just get ignored causing software to not improve.
I think for xine-lib I’ll check out some of the warnings (I at least found some scary code in one situation) and then I’ll disable locally the warn-unused-return warning, or sed out the asprintf() and nice() return values ignoring. In either case I’m going to have to remove some true positive to avoid having to deal with a load of false positives I shouldn’t be caring about. Not nice.
>
Read More... |
Digg This!
I’m writing this entry while I’m waiting for my pasta to get ready, in the lunch break I’m having between finishing some job task. For a series of reasons I’m going slowly at it because life in the past few days have been difficult. Adding to my problem with sleep, my neighbours resumed waking me up “early” in the morning (early by my standards, that is). Yes I know that for most people, 11am is not “early”, but given that I always tended to work during the night (easier not to be disturbed by family, friends, Windows users, ...), and that they know that (not adding the fact that my father works shifts so he also works nights quite often), I wouldn’t expect such amount of noise.
With “such amount of noise” I mean that I get woken up, while sleeping with my iPod playing podcasts, the headphones in, with my room’s door closed. And no I don’t have light sleep, once I get to to sleep, unless I know I have to wake up (either to receive a parcel, to go to work remotely, or whatever). This weekend I ended up sleeping just shy of six hours per night, which is not good for my health either. I have now ordered a pair of noise-isolating in-ear phones, they should arrive tomorrow via UPS, they at least tend to be quite on time. On the other hand the Italian postal service, that should deliver me three Amazon packages, takes the usual eternity.
For what most of my readers are concerned, I still have my tinderbox running, after a few tweaks. First of all, Zac provided me a new revision of the script that generates the list of packages that need to be upgraded (or installed) in the system, which takes good care of USE dependencies so that when they are expressed I can see them before the system tries to merge them in. Then I’m wondering about the order of merge. Since I noticed that more than a couple of time I had to suspend the tinderbox run in the midst of it, the lower-end of the list of packages tended to not be merged as often as the upper-end (where quite a few packages I know still fail to merge). This time I executed the tinderbox from the bottom up (more useful since the sys-* packages are lower), but I’m considering next time to just sort the list randomly before feeding it to the merge command so that there are better chances for all the packages to be built in a couple of iterations.
Speaking about tinderbox and packages, I noticed that there are lots of packages that waste time without good reason, doing useless processing. This includes, for instance, compressing the man pages before Portage does so. While one can understand that upstream would like to provide the complete features to the users, it’s also usually a task that distributions do pick up, and would make sense to provide an option “I’m a packager” to not execute them.
Now, you could argue that those are very small tasks, and even for packages installing ten or twenty man pages it doesn’t feel like too much time wasted. But please think of two things first of all: the compression often enough is done with a for loop in sh rather than with multiple Make rules (which would be ran in parallel), serialising the build, and taking more time on multi-core systems. Secondarily, the man pages will have to be decompressed and compressed again by Portage, so it’s about three time the work strictly needed.
Another problem is with packages that, not knowing where to find an interpreter, be it Perl, Ruby, Python or whatever else, check for it in configure or with other methods and then replace it in all their scripts. And once again quite often using for in sh rather than Make rules. Instead of doing that they should just use /usr/bin/env $interpreter to let the system find the right one, and not hardcode it in the files at all (unless you need a specific version, but that’s another point altogether).
Well, I’ve eaten my pasta now (bigoli with pesto, for those interested), so I’ll get a coffee and be back to work. I’ll try to write a more useful blog later on today after I’m done with work. I have lots of things to write about, included something about XBMC (from an user standpoint, I don’t have time to approach it with the packager’s eye).
>
Read More... |
Digg This!
Although I admit it’s tempting, I’m not going to enter the mess of complaints (warranted or not) about GIT that have found place on Planet GNOME. I don’t intend to go down on what my issues are with bzr either, since I think I exposed them already . I’m going to comment on a technical issue I have with Mercurial, and show why I find GIT more useful, at least in that case.
If you remember xine moved to Mercurial almost two years ago. The choice of Mercurial at the time was pushed because it seemed much more stable (git indeed had a few big changes since then), it was already being used for gxine, and it had better multi-platform support (running git on Solaris at the time was a problem, for instance). While I don’t think it’s (yet) the time to reconsider, especially since I haven’t been active in xine development for so long that my opinion wouldn’t matter, I’d like to share some insight about the problems I have with Mercurial, or at least with the Mercurial version that Alioth is using.
Let’s not start with the fact that hg does not seem to play too well with permissions, and the fact that we have a script to fix them on Alioth to make sure that we can all push to the newly created repositories. So if you think that setting up a remote GIT repository is hard, please try doing so with Mercurial, without screwing permissions up.
For what concerns command line interface, I agree that hg follows more the principle of least surprise, and indeed has an interface much more similar to CVS/SVN than git has. On the other hand, it requires quite a bit of wondering around to do stuff like git rebase, and it requires enabling extensions that are not enabled by default, for whatever reason.
The main problem I got with HG, though, is with the lack of named branches. I know that the newer versions should support them but I have been unable to find documentation about them, and anyway Alioth is not updated so it does not matter yet. With the lack of named branches, you basically have one repository per branch; while easier to deal with multiple build directories, it becomes quite space-hungry since the reflog is not shared between these repositories, while it is in git (if you clone one linux-2.6 repository, then decide you need a branch from another developer, you just add that remote and fetch it, and it’ll download the minimum amount of changesets needed to fill in the history, not a whole copy of the repository).
It also makes it much more cumbersome to create a scratch branch before doing more work (even more so because you lack a single-command rebase and you need to update, transplant and strip each time), which is why sometimes Darren kicked me for pushing changes that were still work in progress.
In git, since the changesets are shared between branches, a branch is quite cheap and you can branch N times without almost feeling it, with Hg, it’s not that simple. Indeed, now that I’m working at a git mirror for xine repositories I can show you some interesting data:
flame@midas /var/lib/git/xine/xine-lib.git $ git branch -lv
1.2/audio-out-conversion aafcaa5 Merge from 1.2 main branch.
1.2/buildtime-cpudetection d2cc5a1 Complete deinterlacers port.
1.2/macosx e373206 Merge from xine-lib-1.2
1.2/newdvdnav e58483c Update version info for libdvdnav.
* master 19ff012 "No newline at end of file" fixes.
xine-lib-1.2 e9a9058 Merge from 1.1.
flame@midas /var/lib/git/xine/xine-lib.git $ du -sh .
34M .
flame@midas ~/repos/xine $ ls -ld xine-lib*
drwxr-xr-x 12 flame flame 4096 Feb 21 12:01 xine-lib
drwxr-xr-x 13 flame flame 4096 Feb 21 12:19 xine-lib-1.2
drwxr-xr-x 13 flame flame 4096 Feb 21 13:00 xine-lib-1.2-audio-out-conversion
drwxr-xr-x 13 flame flame 4096 Feb 21 13:11 xine-lib-1.2-buildtime-cpudetection
drwxr-xr-x 13 flame flame 4096 Feb 21 13:12 xine-lib-1.2-macosx
drwxr-xr-x 12 flame flame 4096 Feb 21 13:28 xine-lib-1.2-mpz
drwxr-xr-x 13 flame flame 4096 Feb 21 13:30 xine-lib-1.2-newdvdnav
drwxr-xr-x 13 flame flame 4096 Feb 21 13:50 xine-lib-1.2-plugins-changes
drwxr-xr-x 12 flame flame 4096 Feb 21 12:53 xine-lib-gapless
drwxr-xr-x 12 flame flame 4096 Feb 21 13:56 xine-lib-mpeg2new
flame@midas ~/repos/xine $ du -csh xine-lib* | grep total
805M total
flame@midas ~/repos/xine $ du -csh xine-lib xine-lib-1.2 xine-lib-1.2-audio-out-conversion xine-lib-1.2-buildtime-cpudetection xine-lib-1.2-macosx xine-lib-1.2-newdvdnav | grep total
509M total
As you might guess the ~/repos/xine content are the Mercurial repositories. You can see the size difference between the two SCMs. Sincerely, even though I have tons of space, on the server I’d rather keep git rather than Mercurial.
If some Mercurial wizard knows how to work around this issue I got with Mercurial, I might consider it again, otherwise for the future it’ll always be git for me.
>
Read More... |
Digg This!
A little context for those reading me; I’m writing this post a Friday night when I was planning to meet with friends (why I didn’t is a long story and not important now). After I accepted that I wouldn’t have a friendly night I decided to finish a job-related task, but unfortunately I’ve had some issues with my system. Somehow the latest radeon driver is unstable (well it’s an experimental driver after all), and it messes up compiz; in turn after a while either X crashes or I’m forced to restart it. This wouldn’t be a problem if the emacs daemon worked as expected. Since it doesn’t, I lose my workspace, with the open files, and everything related to that. It’s obnoxious. Since this happened four times already today I decided to take the night off, but I wasn’t in the mood for playing, so I settled for watching Pirates of the Carribean 2 in Blu-Ray, and write out some notes regarding the topics I wanted to write about for quite a while.
The choice of topic was related to the actual context I’ve just written above. As I said GNU Emacs is acting badly when it comes to the daemon. While the idea of the daemon would be to share buffers (open files) between ttys, network and graphical session, and to actually allow restarting those sessions without losing your settings, your data, and your open files, it’s pretty badly implemented.
A few months ago I reported that as soon as X was killed by anything (or even closed properly), the whole emacs daemon went down. After some debugging it turned out to be a problem with the handling of message logging. When the clients closed they sent a message to be logged by the emacs daemon, but since it had no way to actually write it to a TTY session, it died. That problem have been solved.
Now the problem appear to be just the same mirrored around: after X dies, the emacs daemon process is still running, but as soon as I open a new client, it dies. I guess it’s still trying to logging. As of today the problem still happens with the CVS version.
So anyway, this reminded me of a problem I already wanted to discuss with a blog: user-tied services. Classically, you had user-level software that i s started by an user and services that are started by the init system when the system starts up. With time, software became less straightforward. We have hotplugged services, that start up when you connect hardware like, for instance, a bluetooth dongle, and we have session software that is started when you login and is stopped once you exit.
Now, most of the session-related software is started when you log into X, and is stopped when you exit, sometimes, though, you want processes to persist between sessions. This is the case of emacs, but also my use case for PulseAudio since I want for it to keep going from before I login to before I shut down the system straight. There are more cases of similar issues but let’s start with this for now.
So how do we handle these issues? Well for PulseAudio we have an init script for the systemwide daemon. It works, but it’s not the suggested method to handle PulseAudio (on the other hand is probably the only way to have a multi-user setup with more than one user able to play sound, but that’s for another day too). For emacs, we have a a multiplexed init script that provides one service per user; a similar method is available for other service. Indeed, in my list of things to work on regarding PulseAudio there is to add a similar multiplexed init script to run per-user sessions of PulseAudio without using the system wide instance (should solve a bit of problems).
So the issue should be solved with he multiplexed per-user init script, no? Unfortunately, no. To be able to add the init scripts to the runlevels to be started, you need to have the root privileges. To start, stop and restart the services, you also need root privileges. While you can use sudo to allow users to run the start/stop commands to the init script, this is far from being the proper solution.
What I’d like to have one day is a way to have user-linked services, in three type of runlevels: always running (start when the machine starts up, stop when the system shuts down), after the first login (the services are started at the first login and never stop till shutdown), and while logged in (the services start at the first login, and stop when the last session logs out).
At that point it would be then possible to provide init scripts capable of per-user multiplexing for stuff like mpd too, so that users could actually have the flexibility f choosing how to run any software in the tree.
Unfortunately I don’t have any idea on how to implement this right now, but I guess I could just throw this in for the Summer of Code ideas.
>
Read More... |
Digg This!
That’s a very common question as of lately, and somehow I feel like most people who haven’t dealt with ALSA in the past would find it very difficult to properly answer to it. Even myself I would have ignored one particular issue till last night, when I hit another reason why I want to keep PulseAudio as my main and only audio system as soon as possible, reducing direct ALSA access.
With modern systems you mostly get onboard sound cards, rather than PCI cards and similar, unless you’re somewhat crazy and want a proaudio card (like I did) or a Creative card (not yet sure what’s the fuzz about that). To be honest, my motherboard really does have a limited soundcard so I would have needed a PCI card anyway to get digital S/PDIF output, and I want S/PDIF because the room has too much electric noise to the point I can hear it on the speakers.
Somehow, ALSA seem to hide to most users that there are indeed a lot of limitations with the HDA hardware. This because the idea for the HDA was probably to shift as much work as possible into software space rather than doing so in hardware; this is why you need a driver fix for the headphones jack to work properly most of the time. While this can be seen as a way to produce cheap cards, it has to be said that software always had to compensate for hardware defects since it’s more flexible. And having most of the processing done in software means that you can actually fix it if there is a bug, rather than having to find a workaround if anything.
So, no hardware mixing, and ALSA has to use dmix, no hardware volume handling, and ALSA has to use softvol, no hardware resampling, and ALSA has its own resamplers, and so on so forth. But while ALSA has to cope with doing these things in the same process that are doing the audio output, and thus is not so good at coping with different processes accessing the audio device with different parameters.
Having a separate process doing the elaboration does not mean that you add more work to the system, you can actually make it do less work if, for instance, you ask that process to play a (cached) sound rather than having it opened, converted, and played back.
At the same time, the fact that PulseAudio handles mixing, volume and resampling in software does not mean that it adds to the work done by the software for most cards, since the most common cards nowadays, the HDA-based ones, already do all that in software, just you don’t see that explicitly because ALSA do them in process.
In my opinion, ALSA is really doing way too much as a library, since each time you open a device it has to parse a number of configuration files, to identify which definition to use (and even then, by default they are far from perfect, for instance for the ICE1712-based cards, which depending on the way they are wired may have two, four, six or eight channel outputs, the definition only suits the 8-channel model, and does not make sense with the lower end cards). And once it found the definition it has to initialise a number of plugins, internal or external, to perform the software functions.
And this is all without putting in the mix the LISP interpreter that alsa-lib ships with (I’d leave to Lennart to explain that one).
So if you think that PulseAudio is an over-engineered piece of software that performs functions in software that should be done in hardware, you better be a FreeBSD user. At least there the OSS subsystem does not try to do all the things that ALSA does (and on the other hand FreeBSD is trying to go the Microsoft way for the HDA cards, implementing the UAA specifications rather than having a huge table of quirks; as far as I can see that would be quite useful, if it’s going to work that is).
But even in that case you should probably find PulseAudio a good thing since it tries not to be Linux-specific and thus would allow for performing all the needed functions with a single cross-platform software without having to reimplement it in platform-specific systems like OSS or ALSA. That was my main reason to look into PulseAudio when I was working on Gentoo/FreeBSD.
For everyone who cares about users, let’s try to work all together to make PulseAudio better rather than attacking Lennart for trying to do the right thing, okay?
>
Read More... |
Digg This!
There is one particular topic that was in my TODO list of things to write about in the “For A Parallel World” series, and that topic is recursive make, the most common form of build system in use in this world.
The problems with recursive make has been exposed since at least 1997 by the almost famous Recursive Make Considered Harmful paper. I suggest the reading of the paper to everybody who’s interested in the problems of parallelisation of build systems.
As it turns out, automake supports non-recursive forms quite well, and indeed I use it on at least one project of mine . Lennart also uses it in PulseAudio, as all the modules are built in the same Makefile.am file even though their sources (even the generated ones) are split among different directories.
Unfortunately the paper has to be taken with a grain of salt. The reason why I’m saying that is that if you read it there are at least a couple of places where the author seems to misknown his make rules and defines a rule with two output files, and one where he ignores temporary files naming problems .
There are, of course, other solutions to this problem, for instance I remember the make command for FreeBSD to be able to run into directories in parallel just fine, but I sincerely think here we have to stop one particular problem first. I don’t have too many problems with recursive make for different binaries or final libraries, or for libraries that are shared among targets. Yes they tend to put more serialisation into the process but it’s not tremendously bad, especially not for the non-parallel case where the problem does not seem to appear for most users at all.
What I think is a problem, and I seriously detest it, is when the only reason to use recursive make is to keep the layout of the built object files the same as the source files, and then create sub-directories for logical parts of the same target. With automake this tends to require the creation of convenience noinst libraries that are built and linked against, but never installed. While this works, it tends to increase tremendously the complexity of the whole build, and the time required to build them, since sometimes they get compiled not only into archive file (.a static libraries) but also in final ELF shared objects, depending on their use. Since we know that linking is slow we should try to avoid doing it for no good reason, don’t you think?
In general, the noinst_LTLIBRARIES presence means that either you’re grouping sources that will be used just one in a recursive make system, or you’re creating internal convenience libraries which might even be more evil than that, since it can create two huge binaries like the case of Inkscape.
Once again, if you want your build system reviewed, feel free to drop me an email, depending on how much time I’ve got I’ll do my best to point out eventual fixes, or actually fix it.
>
Read More... |
Digg This!
When I’ve read some rants about Firefox I thought they were a little bit too much. Now, I start to wonder if they were quite to the point instead. But before I start I have to say I haven’t tried contacting anybody yet, neither from the Gentoo Mozilla team not upstream. And I’m sure the Gentoo Mozilla team are doing their best to make sure that they can provide a working Firefox still following upstream guidelines on trademarks.
This actually sprouted from my previous work inspecting library paths I went to check which libraries for firefox-bin were loaded from the system library directory, and noticed one curious thing: /usr/lib/libsqlite3.so was being loaded. What’s the problem? The problem is that I knew that xulrunner (at least built from sources) bundles its own copy of SQLite3, so I wondered if they used the system copy for the binary package. Funnily enough, they really don’t:
yamato link-collisions # ldd /opt/firefox/firefox-bin | grep sqlite3
libsqlite3.so => /opt/firefox/libsqlite3.so (0xf67e7000)
libsqlite3.so.0 => /usr/lib/libsqlite3.so.0 (0xf621e000)
yamato link-collisions # lddtree.sh /opt/firefox/firefox-bin | grep sqlite3 -B1
libxul.so => /opt/firefox/libxul.so
libsqlite3.so => /opt/firefox/libsqlite3.so
--
libsoftokn3.so => /usr/lib/nss/libsoftokn3.so
libsqlite3.so.0 => /usr/lib/libsqlite3.so.0
(The lddtree.sh script comes from pax-utils and uses scanelf. I have a similar script in my Ruby-Elf suite implemented as a testcase, it produces the same results, basically.)
So the binary version of the package uses the system copy of NSS and thus loads the system copy of SQLite3. I haven’t gone as far as checking where the symbols were resolved, but one of the two is going to be loaded and unused, wasting memory (clean and dirty, for relocated data sections). Not nice, but one can say it’s the default binary, and has to know to adapt. In truth the problem here is that upstream didn’t use rpath, and thus the firefox-bin program does not load all its libraries from the /opt/firefox directory (since the /usr/lib/nss directory comes first). Had they built their binary with rpath set to $ORIGIN it would have loaded everything from /opt/firefox without caring about the system libraries, like it was intended to do. Interestingly enough, they do just that for Solaris, but not for Linux where they prefer fiddling with LD_LIBRARY_PATH.
Next, I checked the /usr/bin/firefox started, which I already copied on the other post:
#!/bin/sh
export LD_LIBRARY_PATH="/usr/lib64/mozilla-firefox"
exec "/usr/lib64/mozilla-firefox"/firefox "$@"
Let’s ignore the problem with the rewriting of the environment variable, which I don’t care about right now, and check what it does. It adds the /usr/lib64/mozilla-firefox directory to the list of paths to load libraries from. Since it’s setting LD_LIBRARY_PATH all the library resolutions will have to be done manually rather than using the ld.so.cache file. So I checked which libraries it loads from there:
flame@yamato ~ % LD_LIBRARY_PATH=/usr/lib64/mozilla-firefox ldd /usr/lib64/mozilla-firefox/firefox | grep mozilla-firefox
flame@yamato ~ % scanelf -E ET_DYN /usr/lib64/mozilla-firefox
TYPE FILE
ET_DYN /usr/lib64/mozilla-firefox/libjemalloc.so
(The second commands finds all the libraries in the given path, by checking for ET_DYN, dynamic ELF, files.)
Okay so there is one library, but it’s not in the NEEDED lines of the firefox executable. Indeed that library is a preloadable library with a different malloc() implementation (remember I’ve written about similar things and commented about FreeBSD solution), which means it has to be passed through LD_PRELOAD to be useful, and I can’t see that to be used at all. Indeed, if I check the loaded libraries on my firefox process I can’t find it:
flame@yamato x86 % fgrep jemalloc /proc/`pidof firefox`/smaps
flame@yamato x86 %
Let’s go step by step though, for now we can say with enough safety that the loader is overwriting LD_LIBRARY_PATH with no apparent good reason. Which libraries does the firefox executable load then?
flame@yamato ~ % LD_LIBRARY_PATH=/usr/lib64/mozilla-firefox ldd /usr/lib64/mozilla-firefox/firefox
linux-vdso.so.1 => (0x00007fffcabfd000)
libdl.so.2 => /lib/libdl.so.2 (0x00007fa5c2647000)
libstdc++.so.6 => /usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/libstdc++.so.6 (0x00007fa5c2338000)
libc.so.6 => /lib/libc.so.6 (0x00007fa5c1fc5000)
/lib64/ld-linux-x86-64.so.2 (0x00007fa5c284b000)
libm.so.6 => /lib/libm.so.6 (0x00007fa5c1d40000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00007fa5c1b28000)
flame@yamato ~ % scanelf -n /usr/lib64/mozilla-firefox/firefox
TYPE NEEDED FILE
ET_EXEC libdl.so.2,libstdc++.so.6,libc.so.6 /usr/lib64/mozilla-firefox/firefox
It can’t be right, can it? We know that Firefox loads GTK+ and a bunch of other libraries, starting with xulrunner itself, but there is no link to those. But if you know your linker you should notice a funny thing: libdl.so.2. It means the exeutable is calling into the loader at runtime, which usually means dlopen() is used. Indeed it seems like the firefox executable loads the actual browser at runtime, as you can see by checking the smaps file.
Now there are two things to say here: there is a reason why firefox would be doing that, and the reason is that calling “firefox” with it open already should actually request a new window to be opened, rather than opening a new process. So basically I expect the executable to contain a launcher, that if a copy of firefox is running already just tells that to open a new window, and otherwise loads all the libraries and stuff. It’s a good idea, from one point of view because initialising all the graphical and rendering libraries just to tell another process to open a window would be a waste of resources. On the other hand, dlopen() is not the best performing approach and also creates problem to prelink.
I have no idea why it happens, but the binary package as released by upstream provides a script that seems to be taking care of the launching, and then a firefox-bin executable that doesn’t use dlopen() to load the Gecko engine and all the graphical user interface. I would very much like to know why we don’t do the same for from-source builds, I would sincerely expect that the results would be even better when using prelink and similar.
Now, let’s return a moment to the problem of the SQLite3 loaded twice for the binary release of Firefox, surely the same wouldn’t happen for the from-source version, would it? Check it by yourself:
flame@yamato x86 % fgrep sqlite /proc/`pidof firefox`/smaps
7fea6c8c2000-7fea6c935000 r-xp 00000000 fd:08 701632 /usr/lib64/libsqlite3.so.0.8.6
7fea6c935000-7fea6cb35000 ---p 00073000 fd:08 701632 /usr/lib64/libsqlite3.so.0.8.6
7fea6cb35000-7fea6cb36000 r--p 00073000 fd:08 701632 /usr/lib64/libsqlite3.so.0.8.6
7fea6cb36000-7fea6cb38000 rw-p 00074000 fd:08 701632 /usr/lib64/libsqlite3.so.0.8.6
7fea814dc000-7fea8154f000 r-xp 00000000 fd:08 24920 /usr/lib64/xulrunner-1.9/libsqlite3.so
7fea8154f000-7fea8174f000 ---p 00073000 fd:08 24920 /usr/lib64/xulrunner-1.9/libsqlite3.so
7fea8174f000-7fea81751000 r--p 00073000 fd:08 24920 /usr/lib64/xulrunner-1.9/libsqlite3.so
7fea81751000-7fea81752000 rw-p 00075000 fd:08 24920 /usr/lib64/xulrunner-1.9/libsqlite3.so
Yes, yes it does happen. So I have a process that is loading one library for no good reason at all at runtime, and not a little one at that, when it could probably, at this point, use a single system SQLite library. I say that it could, because now I have enough evidence to support that: if the two libraries had a different ABI, depending on which one the symbols resolve to, either xulrunner or NSS would be crashing down. Since ELF uses a flat namespace, the same symbol name cannot be resolved in two different libraries, and thus one of the two libraries using them would find them in the “?rong” copy. And no, before you ask, neither use symbol versioning.
So at this point the question is: can both Firefox upstream and the Gentoo Firefox ebuild start providing something that does more than just working and actually works properly?
>
Read More... |
Digg This!
I’m not sure why that happens, but almost every time I’m working on something, I find something different that diverges my attention, providing me with more insight on what is going wrong with software all over. This is what causes me to have a TODO list that continuously grows rather than stopping for a while.
Today, while I went on working on my dependency checker script which for now does not even exist, but for which I added a few more functions to Ruby-Elf (you can check them out if you wish), I’ve started noticing some interesting problems with the library search path of my tinderbox chroot.
The problem is that I got 47 entries in the /etc/ld.so.conf file, which means 47 paths that needs to be scanned, and yet I have two entries crated by the env files as LD_LIBRARY_PATH and yet I find bundled libraries with the Portage bash trigger that my collision detection script misses. This let me understand there is more to that than I have been seeing up to now, and got me to dig deeper.
So I checked out some of the entries in the ld.so.conf file, some of them are due to multiple slot per library for different library versions, like QCA does. This is one decent way to handle the problem, by making two libraries differing just by soname in different directories, and passing the proper -L flag to get one or the other. A much saner alternative would be to ensure that two libraries with different API would get different version names, just like glib 2.0 gets a libglib-2.0.so name, but that’s a different point altogether.
Some other entries are more worrisome. For instance, the cuda packages install all their libraries in /opt/cuda/lib, but the profiler installs there also Qt4 libraries, causing them to appear on the system library list, even though with smaller priority than the copy as installed by the Qt ebuilds. Similar things happen with the binary packages of Firefox and Thunderbird, since I have the two of them together with xulrunner, adding their library path to the system library path.
Instead of doing this, there are a few packages that install the libraries outside the library path, and then use a launcher script, whose main use is setting the LD_LIBRARY_PATH variable to make sure the loader can find their libraries. This is what, for instance, /usr/bin/kurso does. Indeed if you start the kurso (closed-source) executable directly you’re presented with a failure:
# /opt/kurso/bin/kurso3
/opt/kurso/bin/kurso3: symbol lookup error: /opt/kurso/bin/kurso3: undefined symbol: initPAnsiStrings
The symbol in question is defined in the libborqt library as provided by Kurso itself, which is loaded through dlopen() (and thus does not appear in neither ldd nor scanelf -n output, just so I remember you of this problem). I could now write a bit around the amount of software written using Kylix and the amount of copies of libborqt in the system, each with its own copy of libpng and zlib, but that’s beside the point for now. The problem is that it indeed does not work by just starting it directly, and thus a wrapper file is created.
These wrappers are a necessary evil for proprietary closed-source software, but are quite obnoxious when I see them for Free Software that just gets build improperly; that’s the case both for the binary packages of Firefox and similar and the source packages, since my current /usr/bin/firefox contains this:
#!/bin/sh
export LD_LIBRARY_PATH="/usr/lib64/mozilla-firefox"
exec "/usr/lib64/mozilla-firefox"/firefox "$@"
This has a minor evil in it (it overwrites my current LD_LIBRARY_PATH settings) and a nastier one, it really isn’t much of a good idea because then also the programs it will call will get the same value, unless it’s unset later on. In general this is not a tremendously bright idea since we do have a method, the use of DT_RUNPATH in ELF files, that allows us to tell an executable to find the libraries in a directory that is not in the path. This is tremendously useful, and should really be used much more, since setting LD_LIBRARY_PATH also means skipping over the loader cache to find the libraries for all the processes involved rather than just for the (direct) dependency of a single binary. You can see that this has been a problem for Sun too .
For what it’s worth, the OpenSolaris linker, provides special padding space since two years ago to allow changing the runpath of executable files you don’t have access to (like the kurso binary above). It’s too bad that we have to deal with software lacking such padding because it would be quite useful too.
Is this all? Not really no, since you also get env.d entries for packages like quagga that seems to install some libraries in a sub-directory of the standard library directory and then adding that to the set of scanned directories. I haven’t investigated much about it yet, this blog post was intended as just a warning call, and I’ll probably try to work further on this to make it work better, but in general if your package installs internal “core” libraries, that you rightly don’t want into the standard library path (to avoid polluting the search path), you should not add it to ld.so.conf through environment files, but should rather use the -rpath option during link to tell it to find the library in a different path.
For developers, if your package does some funny thing with its libraries and you’re not sure if you’re handling it properly, may I just ask you please to tell me about it? I’d be glad to look into it, and eventually post a full explanation of how it works in the blog so that it can be used as reference for the future. I also have a few very nasty issues to blog about on Firefox and xulrunner, but I’ll save those for another time, for now.
>
Read More... |
Digg This!
One of the worst thing that can happen in somebody’s life is when your dreams are scaring you out of your own sleep. As it turns out I’m in one of those situations. A nice period of my life ended just before Christmas, and now I’m in a bit of a pinch, with a late job, and no future (stable) job in view. I’m also out of luck with publishers since the last article I submitted to LWN was not even worth a reply, it seems.
I should be at least well happy about my health, one would expect, given that I am feeling better after the surgery and I just need to visit the hospital for some check-ups now. But even that is out of schedule, since I was supposed to be in for January, and it’s middle February now. The professor I had to reach is unreachable, so I had to pass through another doctor in the staff (whom I’m very grateful to for my previous staying too!).
But as it is said in Italy “one Pope dead, a new one is made”; I admit I’m not sure what the English equivalent would be but I’d expect it to refer to kings.
I’m currently feeling in quite a bluish mood but it’s going to be just fine as soon as I get some good nights’ sleep; relaxed sleep. The problem as I said is that my own dreams, or rather the content and the characters of the dreams I’m having lately, chase me out of bed. Even though I cannot remember the dreams by themselves, the general mood follows me when I wake up and, even though they should be pleasant dreams, they upset me very much.
Luckily I learnt to fight dreams, and nightmares, since I went to the hospital. My way of keeping them away from my mind is to listen to something that turns my attention to something much different just before sleeping. Podcasts have helped a lot about that, but sometimes I need more, longer content I haven’t listened to before. This is especially true when, like right now, Bill Maher is not on HBO so I cannot listen to new Real Time’s podcast episodes. For these times I corrupt a bit of my soul and buy audiobooks from the iTunes Store, yes with the freedom-hungry DRM on.
I was thus quite pleased when an anonymous sent me The Hitchhiker’s Guide to the Galaxy CDs from BBC Radio (and I have to say I envy British people for BBC Radio 4, News Quiz is one of my favourite shows). Even though it also sprouted for me a technical problem: how to convert the CDs in a format that makes use of 100% iPod’s features using just Free Software? I’m afraid I’m unable to answer that question just yet but I hope to be able to soon. Also thanks to the (for now unknown since it hasn’t arrived yet) person who sent me “I’m Sorry I Haven’t a Clue” CDs. I’m not sure what it is but I find the British humour refreshing. Yes I know this is neither normal nor sane …
The problem is that, the way this is going I’m unable to rest, even when I sleep, and thus I cannot work for more than a few hours on Free Software without my head starting to ache. And it’s difficult to sleep in the first place. While I would like to try cutting down on coffee, it turns out that I’m quite addicted to caffeine to the point that twice already in the past three weeks, when I tried to stay a day without getting one I would get a migraine so powerful I would be unable to crawl out of bed.
Anyway so that you know, even if I haven’t blogged about it in a while, nor I have opened new bugs, the tinderbox (or tinderflame to make it distinct from Patrick’s) is still working and crunching data. The new disks do help, since there was one (I’m afraid I know which one, I’ll write about it specifically in the future) that would make the system go stuck on pdflush, which as you might guess is not the nicest of the things. Now it seems to be working better.
Anyway, if you wish I made a special list to see if I can solve my sleep deprivation (although I’m waiting already a few things I ordered myself, so I should be set for a while), but even more importantly, there are two thing I’m going to ask users and developers reading me alike.
If you’re an user, try to raise concern with upstream projects about problems like proper --as-needed usage, parallel build and similar, I know my blog isn’t exactly the nicest place to look up information from but it should have enough to go around with issues like that. Any upstream package that fixes parallel make, --as-needed or autotools by itself is one less package I’ll have to look at when I decide to push forward my agenda of having proper packages around.
If instead you’re a developer, please help me by at least reviewing what I write, correcting me if needed, and especially submitting patches to my projects if you see they are wrong or incomplete. Having people collaborate on my projects is one thing I always miss.
>
Read More... |
Digg This!
Since I have in my TODO list to work on two binutils problems (the warning on softer—as-needed and the fix for PulseAudio build), I also started wondering why I haven’t heard, or rather read, anything about the gold linker .
Saying that I’m disappointed does not really cover much of it to be honest, since I don’t really wish to switch to a linker written in C++ any time soon. But I really hoped that it would generate enough momentum to find a solution. Because, yes, the ld linker that ships with binutils is tremendously slow to link C++ code, and as Linkers & Loaders let me understand now, the problem is not just the length of the (mangled) symbol names, but also the way that templates are expanded and linked together.
But still, I think it’s really worth investigating some alternative, which in my opinion needs not to be written in C++, with all the problems related to that. Saying that the gold linker is fast just because of the language it is written is absolutely naïve, since the problems lie quite deeper than that.
The main problem is that the current ld implementation is based, like the rest of the binutils tools, upon libbfd, an abstraction that allows to support multiple binary formats, not just ELF. It basically allows to use mostly the same interface on different operating systems with different executable formats: ELF under Linux, BSD and Solaris, Mach-O under Mac OS X and PE under Windows and more. While this allows to get a much more powerful ld command, it’s actually a bit of a bottleneck.
Even though the thing is designed well enough for not crumble easily, it is probably a good area to investigate to find why it’s so slow. Having an alternative, ELF-only linker available for users, Gentoo users especially, would likely be a good test. This would follow the same thing that Apple does on OSX (GCC calls Apple’s linker) as well as Sun under Solaris with their copy of GCC.
While I’m all for generic code, sometimes you need to have specialised tools if you want to access advanced features of files, or if you want to have a fast, optimised software.
The same thing can be said for the analysis tool provided by binutils, as I’ve written in my post about elfutils the nm, readelf and objdump tools as provided by binutils, to be generic, lack some of the useful defaults and different interface that elfutils have. Which goes to show why specialised tools here could help. I know that FreeBSD was working on providing replacement for these tools, under the BSD license as their usual. While that’s certainly an important step, I don’t remember reading anything about a new linker.
As it is, I haven’t gone out of my way to see if there are already some alternative linkers that work under Linux, beside the one provided by Sun’s compiler in Sun Studio Express (which has lots of problems on its own). If there is already one we should look at how it stands for what concerns features.
What we desire from a specialised linker, beside speed, is proper support for *.gnu.hash section, --as-needed-like features, no text relocation emitted in the code (which is a problem gold used to have at least), and possibly a better support to garbage collection of unused sections that could allow using it in production code without huge impact on performance as it seems to happen with -fdata-sections and -ffunction-sections.
I’m not going to work on this, but if somebody is interested in my opinion about using, in Gentoo, any linker in particular I’d be glad to look at them, not going to spare words though, so that you know.
>
Read More... |
Digg This!