Before I resume working on PAM (I need to implement a change to pam_lastlog to fix a pernicious bug), I wanted to just write a quick entry for the paranoid of you who still use PAM for system login.
Since, as you most likely already know, MD5 is once again considered insecure, one obvious concern would be the fact that passwords saved in MD5 on a system are not secure either. For this reason if you’re using Linux-PAM, you can make use of the SHA512 hashing of system password keys, which I already wrote about.
Remember that to use that you have to make sure your Linux-PAM (sys-libs/pam) is built against a recent enough version of glibc. Unfortunately the version of pambase with this feature hasn’t hit stable yet, the bug above is blocking it, and I’m going to have to hack at pam_lastlog to fix that.
What I didn’t write last time, is that you can easily spot if your system is using md5 passwords by using this simple command from root:
# fgrep '$1$' /etc/shadow
Of course one has to access your /etc/shadow file to breach your passwords, so your system has to have been compromised before, but it’s still not nice if they can find out what your basic passwords are.
Moving on.
>
Read More... |
Digg This!
After a longish time, here for you a new chapter of my widely read series For A Parallel World, improving buildsystems to reduce build time on modern multiprocessor, multicore systems.
This time, rather than the usual build failures, I’m going to speak of a parallel install failure. Even though one can think of install as a task that rarely can fall into problems like race conditions and the such, and even though it’s probably the part that gets less boost when using parallel make on a multicore system (since it’s usually I/O bound rather than CPU bound), it’s actually one very fragile part of many packages.
One of the common failures is due to old install-sh script used to simulate the install command on systems too old to have a POSIX-compatible one, and which is also used to create directories recursively if mkdir -p is missing. For a series of reason, this hits pretty often on FreeBSD, but this is beside the point. This can be easily solved by replacing the old faulty script with an updated copy out of automake or libtool, which does not have problems at all.
A few times, the problem is instead due to a broken Makefile.am. Let’s take a practical example from some software I fixed recently after being called in action by nixnut: gramps . Please note that if you look at the bug now you’re going to spoil the post, since it contains the solution straight away, while I’m going to explain it step by step.
Let’s start from the reported build log:
test -z "/usr/share/gramps/docgen" || /bin/mkdir -p
"/var/tmp/portage/app-misc/gramps-3.0.3/image//usr/share/gramps/docgen"
/usr/bin/install -c -m 644 'gtkprintpreview.glade'
'/var/tmp/portage/app-misc/gramps-3.0.3/image//usr/share/gramps/docgen/gtkprintpreview.glade'
/usr/bin/install -c -m 644 'gtkprintpreview.glade'
'/var/tmp/portage/app-misc/gramps-3.0.3/image//usr/share/gramps/docgen/gtkprintpreview.glade'
/usr/bin/install: cannot create regular file
`/var/tmp/portage/app-misc/gramps-3.0.3/image//usr/share/gramps/docgen/gtkprintpreview.glade':
File exists
make[3]: *** [install-docgenDATA] Error 1
make[3]: *** Waiting for unfinished jobs....
As usual, the first thing we’re looking for when there is a parallel build (or install) failure are repeated commands. As I’ve shown in Case Study n. 2, when the same command is repeated multiple times it’s often due to mistakes in the Makefiles, thus before thinking of a problem with the dependencies, I check for that. It’s way more common especially on automake-based build systems.
So indeed we can see there are two calls to the install command for the file gtkprintpreview.glade (this also shows us that it’s not a problem of old and faulty install-sh script since the call is directly to the system command). Contrary to what happens when it’s a build rule that is wrongly expressed in the makefile, the double-call during install phase is usually present both using parallel jobs and not. The difference is that when the two calls happen sequentially, the second just overwrites the results of the first; wastes time but it’s successful. On the other hand when parallel jobs are used, the two calls are often enough happening at the same time, and thus we have a race condition.
Okay so next step as usual is to look at the incriminated Makefile.am:
[snip]
docgen_DATA =
gtkprintpreview.glade
dist_docgen_DATA = $(docgen_DATA)
[snip]
Here we’re at the core of the problem. The gtkprintpreview.glade file is part of the sources, and it has to be installed as part of the docgen class of files (thus in $docgendir). But the data installed in that path is listed twice, once in the docgen_DATA variable and one in dist_docgen_DATA, causing the file to be installed twice on two independent targets. Since the two targets are independent, when using parallel jobs they both will run at the same time the same command.
Let me try to explain what the mistake has been. By default the sources are packaged up in the final tarball, if they are not generated by rules from the make process; sometimes you wish files that are built by make to still be distributed, and thus you either have to use EXTRA_DIST or prefix dist_ to the class of the installed files, to explicit that the files have to be distributed. Unfortunately the gramps developers didn’t know automake well enough, and thought that dist_docgen_DATA worked quite a lot like EXTRA_DIST (maybe it actually used EXTRA_DIST in the past, for what I know), and thus duplicated the variable content.
The solution? Just replace the use of docgen_DATA with dist_docgen_DATA and remove the second definition, the problem is solved at the source.
>
Read More... |
Digg This!
If you’ve been following my blog for a while you probably remember how much I fought with VirtualBox once it was released to get it to work, so that I could use OpenSolaris. Nowadays, even with some quirks, VirtualBox Open Source Edition is fairly usable, and I’m using it not only for OpenSolaris but also for a Fedora system (which I use for comparing issues with Gentoo), a Windows install (that I use for my job), and a Slackware install that I use for kernel hacking.
Obviously, the problem is that the free version of VirtualBox come with some disadvantages, like not being able to forward USB devices, having limited type of hardware to virtualise and so on. This is not much of a problem for my use, but of course it would have been nicer if they just open sourced the whole lot. I guess the most obnoxious problem with VirtualBox (OSE at least, not sure about the proprietary version) is the inability to use a true block device as virtual disk, but rather having to deal with the custom image format that is really slow at times, and needs to pass through the VFS.
For these reasons Luca suggested me many times to try out kvm instead, but I have to say one nice thing of VirtualBox is that it has a quite easy to use interface which allows me to set up new virtual machines in just a few clicks. And since nowadays it also supports VT-x and similar, it’s not so bad at all.
But anyway, I wanted to try kvm, and tonight I finally decided to install it, together with the virt-manager frontend although there are lots of hopes for this, it’s not yet good enough, and it really isn’t usable for me at all. I guess I might actually get to hack at it, but maybe this is a bit too soon yet.
Continue reading on my blog for the reasoning, if you’re interested.
>
Read More... |
Digg This!
I’ve been told quite a few times that my posts tend to be too long, and boring for most basic users, so for a time I’ll try to use the “extended content” support from Typo, and see how people react. What this means is that the blog post is just summarised on feeds, and on aggregators like Planet, while the complete text can be read by accessing the article directly on my blog.
When I started my work reporting bundled libraries almost an year ago, my idea had a lot to do with sharing code and just to the side to do with the security issues related to bundled libraries. I had of course first hand experience with the problem, since xine-lib has (and still in part had) bundled a lot of libraries. When I took over maintainership of it in Gentoo, it was largely breaching policy, and the number of issues I had with that was huge. With time, and coordination with upstream (to the point of me becoming upstream), the issues were addressed, and nowadays most of xine-lib bundled libraries are ignored in favour of the system copies (where possible; some were largely modified to the point of not being usable, but that’s still something we’re fighting with). Nowadays, the 1.2 branch of xine-lib already doesn’t have a FFmpeg copy at all, always using the system copy (or an eventual static copy built properly).
But nowadays I started to see that what is obvious to me about the problems with bundled copies of libraries is not obvious to all developers, and even less obvious to “power users” who proxy-maintain ebuilds and just want them to work for them, rather than complying with Gentoo policies and standards. Which is why I think that sunrise and other overlays should always be scrutinised carefully before being added to a system.
At any rate, for this reason I’m going to explain in this post why you should not use bundled internal copies of libraries for packages added to Gentoo, and why in particular these packages should not be deemed stable at all.
>
Read More... |
Digg This!
I’ve already written about the common mistake of using AC_CANONICAL_TARGET in software that is not intended to be used as compiler. Since I’m now using my tinderbox extensively, I’ve thought it might have been a good idea to try checking how many packages actually do call that, that shouldn’t.
The test is really a quick and dirty bashrc snippet:
post_src_unpack() {
find "${S}" -name configure.ac -o -name configure.in | while read acfile; do
acdir=$(dirname "$acfile")
pushd "$acdir"
autoconf --trace AC_CANONICAL_TARGET 2>/dev/null > "${T}"/flameeyes-canonical-target.log
if [[ -s "${T}"/flameeyes-canonical-target.log ]]; then
ewarn "Flameeyes notice: AC_CANONICAL_TARGET used"
cat "${T}"/flameeyes-canonical-target.log
fi
popd
done
}
This provides me with enough information to inspect the software, and eventually provide patches to correct the mistake. As I said this is a very common mistake, and I’ve fixed quite a few packages for this. Not only it wastes time to identify the target system, but it also provides a totally misleading --target option to the configure script that confuses users and automatic systems alike; if we were to write a software to generate ebuilds out of the source tarball of a software with some basic common options, it would probably be confused a lot by the presence of such an option.
Since the whole build, host and target situation is often confusing, I’d like to try explaining it with some images, I think that might be a good way to show users and developers alike how the three machine definitions interact between each other. Since this is going to be a long post, in term of size, rather than content, because of the images, the extended explanation won’t be present in the feed.
Continue reading on my site for that.
To try explaining this in a very visual way, let’s say we have only three systems, a PowerBook laptop, a standard x86 computer, and a build service using x86-64 servers. The choice of PoweBook as smallest device has been conditioned by the fact it was the only decent image I could find on OpenClipart for a system that would have been easily seen as having a different architecture than the other two. I would have liked an ARM board, but it was wishing too much.
The first obvious case is having a native compiler, no cross-compiling involved at all:

In this case, both gcc’s configure script, the powerpc-linux-gnu-gcc compiler and the hellow program are executed on the same system: the laptop. This is the standard case you have on a Gentoo system when building stuff out of most ebuilds. In this case host, build and target machines are all one the same: powerpc-linux-gnu.
Then there is a very common case for embedded developers, cross-compilers:

In this case gcc’s configure (and thus build) is executed on a PC system, which also will run the powerpc-linux-gnu-gcc compiler, but the hellow program is still executed on the laptop. Host and build machines are i686-pc-linux-gnu while target is powerpc-linux-gnu.
The next scenarios is uncommon for standalone developers but is somewhat with binary distributions for smaller systems:

In this case there is a build service that starts up the build for the compiler, that will then be executed by the laptop directly. In this case the build machine is x86_64-pc-linux-gnu while both host and target are powerpc-linux-gnu.
The final scenario involves all three systems at once and shows exactly the difference between the three machine definitions:

In this case the build service prepares a cross-compiler executed on a PC that will build software to run on the laptop. The build machine is x86_64-pc-linux-gnu, the host machine is i686-pc-linux-gnu and the target is powerpc-linux-gnu.
Now, this works pretty well and sleek for compilers, but what about other software? In most cases you got just two systems involved at most, one that will run the software and one that will build it, so there is no need for a target definition, it’ll all be completed between build and host. And this is why you should not be calling AC_CANONICAL_TARGET unless you can figure out a far-fetched scenario where you can involve three computers with three different architectures, like in the last scenario.
>
Read More... |
Digg This!
There is one thing I noticed working on my linking collision script. While most of the software properly creates subdirectories to put plugins in, so that they don’t clash with others and it does not pollute the LDPATH space, there are quite a few packages that don’t do that at all and install their plugins straight into the libdir.
Not only that, some packages install static archive versions of their own plugins, with no good reason since they are never linked in statically, but always dlopen’d.
Please don’t pollute LDPATH, if you can, make sure the plugins are installed in “pkglibdir” (that is /usr/lib/packagename) and make sure that they only install the shared object file, and eventually the libtool archive if the software uses libltdl to load them. The static archive is almost always unneeded and just a waste of time.
Also please remember that if you install core libraries in a path outside of the standard libdir (which is very good if the libraries are not to be linked against!), you should probably make sure that there is a proper runpath in the executables. What runpath does is to tell the linker to look for libraries in a path that is otherwise not accessible through the standard configuration files (/etc/ld.so.conf). A common mistake here is to install an env file that makes LDPATH (or even worse LD_LIBRARY_PATH) to the directory where the core internal libraries are installed.
While this works, it does not make much difference than having it in the standard library path: both the runtime linker and the link editor will use the path from the configuration file anyway, so the library is not going to be hidden like you’d want it to for a private library. On the other hand, if just the executable provides their own runpath, then the two linkers will ignore the libraries altogether.
So please, be careful with what you push in the library path, okay?
>
Read More... |
Digg This!
I’ve already written about some of the differences between my and Patrick’s tinderboxes, one of which is that my tinderbox does not only install one package, but tries to install all of them at once. This is a necessity for me to have enough material to work on with my library collisions script, and still have so many side effects that makes it funny to work with.
The first problem is that sometimes packages don’t get merged together because of file collisions, which most of the time are caused by packages that install commands with name too generic, and some other times because they regenerate files that should not be present in the final image (like iconv’s charset.alias that gets generated on Gentoo/FreeBSD systems with no good reason at all).
The second problem derives from the way the first problem is handled. When two packages install a file with the same name, rather than renaming one of them or both, it’s customary to actually “fix” the problem by adding blockers to each of the two packages so that they cannot get installed together. While it’s certainly better to have it expressed that way rather than having the merge to fail after the compilation and install phases, it’s not really a solution since it still disallows having the two packages on the same system. While this is acceptable for packages like the different GhostScript implementations that apply for the same task, this is not much of a solution when the packages are entirely independent one from the other and have very different tasks.
I also have found one particular package (pnet) which had a very funky solution to the collision between that and boehm-gc, considering that it was installing a private copy of that. Obviously this was not the proper fix by a mile’s look.
If you have two packages that block each other you have a few different ways to deal with that; if they provide the same function, you might as well install them with a prefix and then write an eselect module to choose between one or the other (which is something that ghostscript could very well be doing); if they only install executables with generic name, they might be changed to prefix the command name with the package name. But sometimes these commands are not to be used by the users at all, and are rather internal commands used by the scripts for processing; in those cases, it would be a nice idea to make those get into /usr/libexec/$PN/ so that they are taken out of the users’ path, and won’t collide one with the other.
While dealing with packages that install colliding files is not so easy, there is need for developers to deal with them in a less “works for me” way, and think more of the general picture, as it is, there are enough packages in the tree that blocks each other with no real good reason, and this is upsetting.
>
Read More... |
Digg This!
So it’s Christmas day, and I’m skimming through the 32+ MB of logs generated by my elven script and I found some nice nuggets:
Symbol getopt_long@@ (32-bit UNIX System V ABI Intel 80386) present 35 times
/usr/bin/graph
/usr/bin/ebzipinfo
/usr/bin/spline
/usr/bin/double
/usr/bin/uupick
/usr/bin/autotrace
/usr/bin/uustat
/usr/sbin/ndtpd
/usr/bin/ode
/usr/sbin/ndtpcheck
/usr/bin/ebrefile
/usr/bin/uucp
/usr/sbin/ndtpcontrol
/usr/bin/uulog
/usr/sbin/uuxqt
/usr/bin/ebzip
/usr/bin/faad
/usr/bin/lha
/usr/lib/libsox.so.1.0.0
/usr/sbin/uucico
/usr/bin/ebinfo
/usr/sbin/uuconv
/usr/bin/plot
/usr/bin/tek2plot
/usr/lib/libc.so.5
/usr/bin/uuname
/usr/bin/uux
/usr/bin/ebstopcode
/usr/bin/stklos
/usr/bin/cu
/usr/bin/plotfont
/usr/bin/ebfont
/usr/bin/ebunzip
/usr/bin/pic2plot
/usr/sbin/uuchk
Symbol strncasecmp@@ (32-bit UNIX System V ABI Intel 80386) present 5 times
/usr/bin/xpilots
/usr/bin/gargoyle-agility
/usr/lib/libc.so.5
/usr/bin/xedit
/usr/bin/xpilot
Symbol strnlen@@ (32-bit UNIX System V ABI Intel 80386) present 5 times
/usr/bin/tarsync
/usr/lib/libCw.so.1.0.0
/usr/lib/libmba.so.0.9.1
/usr/lib/python2.5/site-packages/numarray/_chararray.so
/usr/bin/linksys-tftp
Symbol stricmp@@ (32-bit UNIX System V ABI Intel 80386) present 5 times
/usr/bin/qemacs
/usr/lib/libraidutil.so.0.0.0
/usr/lib/openbabel/2.2.0/inchiformat.so
/usr/lib/libxerces-c-3.0.so
/usr/lib/libIL.so.1.0.0
Now if you ignore the references to the old compatibility libc.so.5 you can still find that there are quite a few programs that reinvent the wheel, reimplementing some functions that the C library already provides. Now, this would be fine and dandy if the implementation was subtly different, or was done with some particular purpose in mind, like glib’s functions, but I really can’t find the reason for the situation above to happen.
These are usually smaller functions, but still there is no reason for them to be present since anyway it’s more than likely that most of their use is replaced away by the compiler itself; more interesting is their presence in shared objects, since that would interpose around other calls, although this most likely won’t happen on glibc based systems since they provide versioned symbols which wouldn’t then interpose by default. That’s still a problem for *BSD systems though.
This has a much lower priority in my list than identifying all libraries bundled by various packages (even when they cannot be fixed, because they are proprietary or something else), because these are unlikely to turn out being security issues. On the other hand, the idea of doing such pointless duplication of common functions might as well caus security issues to be hidden, for instance if a package was to reimplement mktemp, then it would most likely be a problem.
Anyway for those interested to find out what’s duplicating code in their system, the new hit parade, derived directly from the Bug shows SQLite3 trying to climb up the ladder, together with boehm-gc. Why people can’t understand that there are libraries in the system already?
>
Read More... |
Digg This!
So after almost an year, I returned working on the library collision detection that comes with ruby-elf. I now have a much more powerful system to work on, but I also have a much bigger set of samples to scan, in my tinderbox.
Beside a few issues that I’ve had to taken care of, I’ve now got a database that contains a huge amount of data to process and useful information to derive. Just to give some statistics, the script harvested 24602 ELF files, counting 2713395 symbols, of which 326399 are duplicate between objects. These statistics have already counted out all the suppressed symbols known up to now, but obviously there are more yet unknown.
To stick with statistics for now, rather than actual results, it’s interesting to know that about 1365671 C++ ABI symbols, I sincerely wonder how many of these should be hidden instead.
On a more interesting note, Samba confirms itself sub-optimal by now having yet a convenience library for its shared functions, and copying over the symbols between the various pieces of the puzzle, included six different Python modules, whose total size would probably be cut in half I guess.
Sticking with the Python side, there is one damn issue that is really upsetting me: about thirty different packages link the Python interpreter statically rather than dynamically, resulting in around 30 different copies of Python itself in the full of Portage. Nasty. The problem is that the ebuild installs the shared and static libraries in two different paths, one of which being private “config” path for Python. The packages picking that up will explicitly request it at link time, causing Python to be linked in statically rather than dynamically:
Symbol PyBaseString_Type@@ (32-bit UNIX System V ABI Intel 80386) present 30 times
/usr/games/bin/diameter
/usr/bin/vim
/usr/bin/gedit
/usr/lib/root/libPyROOT.so.5.22
/usr/lib/python2.5/site-packages/opencv/_highgui.so
/usr/lib/libpython2.5.so.1.0
/usr/bin/eog
/usr/games/lib/mcl/plugins/python.so
/usr/sbin/bextract
/usr/bin/zapping
/usr/lib/python2.5/site-packages/opencv/_cv.so
/usr/bin/gvim
/usr/bin/gwp
/usr/bin/cooledit
/usr/lib/planner/plugins/libpython-plugin.so
/usr/sbin/bacula-fd
/usr/sbin/bacula-sd
/usr/lib/dia/libpython_plugin.so
/usr/lib/xchat/plugins/python.so
/usr/lib/xchat-gnome/plugins/python.so
/usr/sbin/bacula-dir
/usr/lib/gnumeric/1.8.3/plugins/python-loader/python_loader.so
/usr/games/bin/adonthell
/usr/games/bin/kiki
/usr/lib/libgnt.so.0.0.0
/usr/bin/epiphany
/usr/lib/python2.5/site-packages/_lcms.so
/usr/lib/perl5/vendor_perl/5.8.8/i686-linux/auto/Inline/Python/Python.so
/usr/lib/apache2/modules/mod_wsgi.so
/usr/lib/apache2/modules/mod_python.so
As you can see this includes two Apache modules, and quit a few pieces of GNOME. This is quite nasty. My suggestion until this is sorted out is not to enable python USE flag unless you really really really need it. The nastiest bit is that since there has been a Python vulnerability if you didn’t rebuild these packages after the bump, you’ll have them using a vulnerable interpreter, still. Do I really have to spell out how bad that is for stuff like Apache modules?
>
Read More... |
Digg This!
Short preamble: I’m in a very depressed mood, like I haven’t been in month; this is very bad for my health but usually means I can focus on things much better, so you might actually find out I’m doing more than usual. Of course there is also to count in that I’m working during holidays so it’s not going to be all nice at all, even counting my depression off.
As I’ve written, I don’t trust closed-source software even the slightest even though it does not really mean that free software is much better, process-wise, dealing with bundled libraries (like the bundled libs bug shows), with free software, or at least open-source software, there is the chance to check the sources out to fix the eventual issues.
This means that I won’t be using closed source software where security is a major concern, but since sometimes I have to use closed-source software, like Skype, or Sun’s compiler, it’s obvious that I have to find a compromise so I can still use them and yet feel reasonably safe. This is what is usually called having a mitigation strategy.
One of the most complex and well known mitigation strategies is of course SElinux, which makes a Linux system more like an APC than a computer. But such a system is probably safe to consider overkill for most systems, especially power user desktop systems.
Since this is, as I said, overkill, I’m more prone to look at smaller strategies, one of which I already discussed about: pam_mktemp . This module allows to create per-user private directories that make it much harder to exploit insecure temporary files vulnerabilities. Which is very nice since this seems to be a very common class of vulnerabilities, and my data shows that there is way too much software that still uses insecure functions to create temporary files, closed and open source alike.
Unfortunately, as you can read in my earlier blog post, this is not automatically a way out of the problem. The start-stop-daemon command from OpenRC plays nice with this just in the last release, and even with that, there are problems. The first problem is that the way pam_mktemp works, there is a need for the software calling PAM to open the session to properly set up the environment with its changes (which is what s-s-d lacked in previous versions). This causes for instance the gnome-keyring daemon to start with the wrong temporary directory when started by the PAM session chain. Even though pam_mktemp is invoked before the daemon, by the time it’s started the TMPDIR variable is not set in the environment. The reason for this is that the variable should not be changed if the session chain aborts the login.
The second problem is that not all software supports TMPDIR properly; Emacs has been fixed recently and now the emacs daemon starts up properly, but other software ignores TMPDIR altogether. VirtualBox (of which I still have things to say beside this) does not respect it for instance, which means that the module wouldn’t have spared you from the recent vulnerability that involved the software.
The third problem is that sometimes software expects TMPDIR to be world-readable, which is a bad assumption; Samba does this, and since s-s-d is now fixed, it now fails to work on my system. I still haven’t found out whether the PAM session chain was called at that point, and it’s just duplicating the problem with s-s-d with a different symptom, or if it fails to call it entirely. In either case, it’s a thing that has to be fixed to make sure that mitigation strategies like this one get in the default spirit of users.
But again this is just one part of the problem, and one part of mitigation. Other problems relate to the way we run some of the services, a lot of which still run as root rather than under a unprivileged user; while the git-daemon issue is now solved and the default install does not run as root any longer, there are more daemons that have the same problems.
Just as an example, I noticed that the iSCSI daemon ietd still runs under root, and I’ve added that to the list of software I have to check to see if I can improve it. Similarly, the init script for mpd does not use s-s-d to switch user but leaves it to mpd itself, spawning it by default with unneeded root privileges, and additionally not allowing pam_mktemp to create a new temporary directory for the mpd user (I have to spend some time on that since I’d also like to provide an alternative init script with multiplexing, which would then allow to run multiple mpds for different users, and in my case to just have the single mpd running as my own user rather than a different user entirely).
At any rate, I’m going to continue my best to make sure that secure defaults are in place in Gentoo, and that further mitigation strategies can be made available so that the users forced to use proprietary closed-source software don’t need to just accept whatever comes their way. Please join my efforts, if you can, by checking which software ignores TMPDIR and asking nicely upstream to fix the issue.
>
Read More... |
Digg This!
In my post regarding remote debugging (which I promised to finish with a second part, I just didn’t have time to test a couple of things), I’ve suggested the idea I’d like to have some kind of package splitting in Portage, to create multiple binary packages out of a single source package and ebuild, similarly to what distributions based on RPM or deb do (let’s call them RedHat and Debian, for historical reasons).
Now, I want to make sure nobody misunderstand me: I don’t intend to propose this as a way of removing the fine-grained control USE flags give us; I sincerely love that; and I also love not having to worry about installing -dev and -devel packages on my machines to be able to build software, even outside of the package manager’s control. I really find these two are strengths of Gentoo, rather than weakness, so I have no intention to fiddle with them. On the other hand, I think there are enough uses that would allow for an even finer control on binpkg level.
I’ve already given a scenario in my post about remote server debugging, but let’s try to show something different, something I’ve actually been thinking about myself. Yes I know this is a very vested interest for me, but I also think this is what makes Free Software great most of the time; we’re not solutions looking for problem, but usually solutions to problem one had at least at one point in time. Just like my writing support for NFS export on the HFS+ filesystem in Linux.
So let me try to introduce the scenario I’ve been thinking about. As it happens, I tend to a series of boxes in many offices for friends and friends of friends in my spare time, on the side. It’s not too bad, it does not pay my bills, but it does pay for some side things, which is good. Now since these offices usually use Windows, even though I obviously install Firefox as the second step after doing the system updates, it’s not unlikely that every other time I go there I have to clean up the systems. I think there are computers I’ve wiped up and reinstalled a few times already. I’ve now been thinking about setting up some firewalls based on Snort or similar. Since I am who I am, these would end up being Gentoo-based (as a side note, I’m tempted to set it up here so I can finally stop having trouble with Vista-based laptops that mess up my network). Oh and please, I know it might sound very stupid considering there are solutions good for this already, but considering how much I’m paid and the amount of money they are ready to spend (read: near to none), I would find it nicer to be paid to work on some Gentoo-related stuff than be paid to just look up and learn how to use already made equipment. Of course if you have suggestion, they are welcome anyway.
So anyway, in this situation I’d have to set up boxes that would usually feel very embedded-like: a common basis, the minimum maintenance possible, upgrades when needed. Donnie’s idea of using remote package fetching and instant deletion is not that good for this because it still requires a huge pipe to shove the data around; not only I don’t have so much upload bandwidth to employ for binpkging a whole system with debug information, it would also be a hit that most of my users wouldn’t like to have, on their bandwidth (if they want to use BitTorrent or look up p0rn from the office is not my problem).
With this in mind, I’d sincerely find it much nicer to be able to split packages, Portage-side, into multiple binary packages that can be fetched, synced, or whatever else, independently, as needed. As I proposed, a binpkg for the debug information files, but also a binpkg for documentation (including man and info pages), one for development data (headers, pkg-config), and maybe one for the prepared sources, that I want to talk about in a moment. With an environment variable it shouldn’t be much of a problem to choose which ones of these split binary packages to install in the system without explicit request; with a default including all of them but the debug informations and the sources. This would also replace the INSTALL_MASK approach as well as noinfo, noman, nodoc FEATURES. It wouldn’t be like a logical split of a package in multiple entries in the system, but rather a way to choose which parts to install, complementary to USE flags.
As for packaging the sources as I said above, there are two interesting points to be made for that, or maybe three. The first problem is that when you have to distribute a system based on Gentoo, you cannot just provide the binaries; since many packages are released under the GNU GPL version 2, even if you didn’t change the sources at all you should be distributing them alongside the binaries; and we modify a lot of sources. For license compliance we should also provide the full set of sources from which the code is derived. This is especially tricky for embedded systems. By packaging up the sources used for the builds, embedded distributors would be able to just provide all the -src subpackages as the full sources for the system.
The second point is that you can use the source packages for debugging too. Since there is, as far as I know,no way to fully embed the source code of software in the debug section of the files generated from that, the only way for GDB to display the source code lines during debugging is having the source files used for build available during the debugging session. This can easily be done by packaging up the sources and installing them in, say, /usr/src/portage/ when they are needed, from a subpackage.
A final point would be that by packaging sources in sub-packages, and distributing them, we could be reducing the overhead for users to unpack (maybe with uncommon package formats) and prepare sources (maybe with lots of patches and autotools rebuilding). Let’s say that every 6 hours a server produces md5-based source subpackages for all the ebuilds of the tree, or a subset of them. Users would then use those sources primarily, but still having the ebuilds to provide all the data and workflow so that the original untouched source would be enough to compile the package. Of course this would then require us to express dependencies on a per-phase basis, since then autotools wouldn’t be required at buildtime at all.
Okay I guess I’m really dreaming lately, but I think that throwing around some ideas is still better than not doing so, they can always be picked up and worked on; sometimes it worked.
>
Read More... |
Digg This!
I’ve already written something about automation for Gentoo bug search, but I think sometimes it’s not easy to understand that just using a huge tinderbox, even distributed, is not going to help much to make sure that software works. The problem is that sometimes, even if software builds fine, it’s just going to break at runtime, and even though tests help, when they are handled properly, they are far from complete solutions.
Problems like the one described in the post above for hfsplusutils are just impossible to gather at build time without running the tests on sample data (which reminds me of the uif2iso problem but this can get even more subtle: while it’s true that most --as-needed failure happen at buildtime, there are quite a few that will not hit until runtime; one such cases happened to me with libcompizconfig, compizconfig-python and fusion-icon: this last one software wouldn’t start because Python failed to load the second, because the first one wasn’t linking to libX11.
Now of course this could have been found if either libcompizconfig or compizconfig-python had a testsuite, but since I already said that running a tinderbox run with testsuite is probably not something that I would like to do on a daily basis.
Especially for old software, there are problems like endianness issues, 64-bit arches and PIC code that are almost impossible to figure out at buildtime, and that need to be checked during software use. But it’s not just that. For binary packages, especially those of proprietary software, there usually isn’t a testsuite, for this reason, their executables are rarely checked at runtime for consistency. This becomes a problem because sometimes you have software that links against old versions of libraries. This is the same reason why adding VirtualBox 2.1.0 binary package in the tree is going to take a while for Alessio: it uses the old ABI of libcap, which will require resurrecting, and maintaining, an old verison of libcap just for that. (I have a few more issues to talk about regarding the new VirtualBox release but I’ll get to that in another moment).
And yet, the reasons why neither my nor Patrick’s tinderbox can be a replacement for a more throughout approach to packages testing are not finished here. But before proceeding to more, I have to make a distinction between the different approaches me and Patrick took. Patrick’s tinderbox removes all the superfluous packages from the system when installing a new one, which is very good to test for missing dependencies; my method instead iterates over each of the packages in the tree and installs them one by one in the system, filling up the space, which can easily ignore missing dependencies but provides more interesting results regarding iteration of particular ebuilds.
So while my method glosses over broken runtime and buildtime dependencies, like pkg-config not depended upon and similar, Patrick’s method is not going to hit problems like dev-scheme/chicken breaking most of the Mono packages (that would pick up /usr/bin/csc as the C# compiler rather than mcsc), or collisions between unrelated packages.
This means that either one of the two tinderboxes is just not enough to find all the issues, and even the two of them together won’t be enough. Even adding AutoTua to that, it’s just not going to cut it. As Jeremy said on a blog post of mine, we need humans (developers and users) to report issues. I start to feel we also have a need for some real numbers of how many users use packages. Yes I know that’s going to be a popularity contest, and it’s likely that there will be people that would just go on to submit fake results, but even for tree cleaning, it’s important to know whether packages don’t have bugs failed against them because they are good, or just because nobody has used them in so much time.
Oh and so that you know, I currently have little less than 1500 bugs open that I reported (and over 3000 bugs that I reported since I started contributing to Gentoo), and all of them are reported by hand, there are still issues that force me not to use scripts like pybugz. I’ll see to write about them, maybe Zac can see to find a solution to those, like he has been doing quite a while lately for me. Thanks Zac!
>
Read More... |
Digg This!
There are many different but common misconceptions about debugging that are spread among those users who, having never learnt a programming language, cannot understand properly the difference between debug code and debug information. Some of these misconception causes misunderstanding with Gentoo’s way of handling the two things as separate and distinct feature of a software.
First of all, let’s start with what the -g, -ggdb and -g3 options are supposed to do. These three flags are used to add debug information, in form of either stabs or DWARF data, depending on the architecture, to the compiled files, may they be object files, shared objects or final executables. This data is used by debuggers like gdb to provide a meaningful backtrace, and is added to some special sections of the file. The difference between a file that is built with these options and a file that is built without those can be removed by using the strip command, since they don’t go touching the actual executable code or data entries. The only software that is susceptible to break with -g3 is the dynamic loader, and even that I’m not sure why.
The various level of debugging information are used to provide various level of backtracing, starting from giving the name of the functions called, and arriving to have the line numbers, the source lines, and macro expansion (which is especially useful when debugging stuff like scanelf, that is composed of a huge amount of macro-based meta-functions. Even when the full debug information is enabled in files, it’s not hindering performance, if not during the first scan and read of the ELF files, since the loader does not load the debug information by default, they are not in sections that are allocated in memory at runtime at all. (This is something you can easily understand once you know the difference between allocated and non-allocated ELF segments).
Debug code, instead, means adding special instruction in the executable code for debugging purposes; the most simple example is the assert() macro used to make sure that unexpected code paths are not taken; although this is often misused as a way to enact limitations in functions, the original idea behind assertions was to make the program die in a way easy to debug when a condition supposed to be always true was instead false. These checks wouldn’t be needed during standard usage, or should be handled gracefully if they indeed happen, so the assertions would just need to be taken out of the built code at that point, which is exactly what -DNDEBUG does. (On an autotools-related note, the AC_HEADER_ASSERT autoconf macro not only checks for the correct header, but also provides an easy to use --disable-assert option for the configure script to disable assertions altogether). Unfortunately nowadays assertions are often used to check the behaviour of code at runtime, even though an error would then cause the abort of the software, which makes it more difficult to just disable them altogether for final users.
But debug code can be much more complex, and might slow down operation a lot; it might be logging data extensively, it might fill the terminal with pointless information, it might check every and each step during processing. This type of code must certainly not be enabled for users’ runtime or their work would be greatly hindered.
Now that the distinctions are made, you can see why splitdebug/strip FEATURES are distinct from the debug USE flag. If you want to just get a backtrace for a crash you got during execution, you need debug information, you don’t need debug code; if possible, debug code might actually stop the software from crashing; as could reducing the optimisation flags. For users, it’s more than likely than the debug USE flag wouldn’t be useful at all; for developers who know what to do, this fine-grained control is most likely the best option they have.
So please next time you think about mixing the debug USE flag and the splitdebug/strip FEATURES in the same idea, try to think of what exactly you want to achieve. And no, disabling -O2 is not always a good idea to have a meaningful backtrace, especially since as I said, -O0 might make stuff not build, so you shouldn’t be ready to just enable that unconditionally to get a backtrace for a bug report.
>
Read More... |
Digg This!
If you follow my blog since I started writing, you might remember my post about imported libraries from last January and the follow up related to OpenOffice you might know I did start some major work toward identifying imported libraries using my collision detection script and that I postponed till I had enough horsepower to run the script again.
And this is another reason why I’m working on installing as many packages as possible on my testing chroot. Now, of course the primary reason was to test for --as-needed support, but I’ve also been busy checking build with glibc 2.8, and GCC 4.3, and recently glibc 2.9 . And in addition to this, the build is also providing me with some data about imported libraries.
With this simple piece of script, I’m doing a very rough cut analysis of the software that gets installed, to check for the most commonly imported libraries: zlib, expat, bz2lib, libpng, jpeg, and FFmpeg:
rm -f "${T}"/flameeyes-scanelf-bundled.log
for symbol in adler32 BZ2_decompress jpeg_mem_init XML_Parse avcodec_init png_get_libpng_ver; do
scanelf -qRs +$symbol "${D}" >> "${T}"/flameeyes-scanelf-bundled.log
done
if [[ -s "${T}"/flameeyes-scanelf-bundled.log ]]; then
ewarn "Flameeyes QA Warning! Possibly bundled libraries"
cat "${T}"/flameeyes-scanelf-bundled.log
fi
This checks for some symbols that are usually not present without the rest of the library, and although it gives a few false positives, it does produce interesting results. For instance while I knew FFmpeg is very often imported, and I expected zlib to be copied in every other software, it’s interesting to know that expat as much used as zlib, and every time it’s imported rather than used from the system. This goes for both Free and Open Source Software and for proprietary closed-source software. The difference is that while you can fix the F/OSS software, you cannot fix the proprietary software.
What is the problem with imported libraries? The basic one is that they waste space and memory since they duplicate code already present in the system, but there is also one other issue: they create situations where even old, known, and widely fixed issue remain around for months, even years after they were disclosed. What preserved proprietary software this well to this point is mostly related to the so-called “security through obscurity”http://en.wikipedia.org/wiki/Security_through_obscurity. You usually don’t know that the code is there and you don’t know in which codepath it’s used, which makes it much harder for novices to identify how to exploit those vulnerabilities. Unfortunately, this is far from being a true form of security.
Most people would now wonder, how can they mask the use of particular code? The first option is to build the library inside the software, which hides it to the eyes of the most naïve researchers; by not loading explicitly the library it’s not possible to identify its use through the loading of the library itself. But of course the references to those libraries remain in the code, and indeed most of the times you’ll find the libraries’ symbols as defined inside executables and libraries of proprietary software. Which is exactly what my rough script checks. I could use pfunct from the seven dwarves to get the data out of DWARF debugging information, but proprietary software is obviously built without debug information so it would just waste my time. If they used hidden visibility, finding out the bundled libraries would be much much harder.
Of course, finding which version of a library is bundled in an open source software package is trivial, since you just have to look for the headers to find the one defining the version—although expat often is stripped out of the expat.h header that contains that information. On proprietary software is quite more difficult.
For this reason I produced a set of three utilities that, given a shared object, find out the version of the bundled library. As it is it quite obviously doesn’t work on final executables, but it’s a start at least. Running these tools on a series of proprietary software packages that bundled the libraries caused me some kind of hysteria: lots and lots of software still uses very old zlib versions, as well as libpng versions. The current status is worrisome .
Now, can somebody really trust proprietary software at this point? The only way I can trust Free Software is by making sure I can fix it, but there are so many forks and copies and bundles and morphings that evaluating the security of the software is difficult even there; on proprietary software, where you cannot be really sure at all about the origin of the software, the embedded libraries, and stuff like that, there’s no way I can trust that.
I think I’ll try my best to improve the situation of Free Software even when it comes to security; as the IE bug demonstrated, free software solutions like Firefox can be considered working secure alternatives even by media, we should try to play that card much more often.
>
Read More... |
Digg This!
flame@yamato ~ % touch /var/tmp/portage/test
touch: cannot touch `/var/tmp/portage/test': No space left on device
flame@yamato ~ % df -h | grep /var/tmp
32G 7.2G 23G 25% /var/tmp
flame@yamato ~ % df -i | grep /var/tmp
2097152 419433 1677719 21% /var/tmp
Now, a mount cycle later it worked fine, but it’s still not too nice since it caused all the emerge running to fail, just like XFS did, but without leaving trace on the kernel log, which makes it obnoxious since it’s hard to debug. I hope 2.6.28 is going to be better, certainly the tinderboxing is a nice way to stress-test filesystems.
I start to consider the idea of OpenSolaris, NFS, and InfiniBand…
>
Read More... |
Digg This!