On a couple of our servers running Solaris we noticed that it usually takes more than 10s to login. Once in everything is a snap. I quickly investigated it and this turned out to be interesting. I used truss(1M) to investigate what's going on from the moment I connect to the moment I have a working shell.
# truss -f -o /tmp/a -v all -adDE -p 408
Now I logged in to the system and analyzed
/tmp/a file. First I confirmed that it took over 10s to login. From the moment the connection was accepted to the moment I got interactive session it took about 11s as shown below:
[...]
408: 2.6594 0.0007 0.0000 fcntl(4, F_SETFL, (no flags)) = 0
[...]
12186: 14.0814 0.0001 0.0000 write(4, " | : { b7F S LB7A2 BA13".., 64) = 64
12196: read(0, 0x080473DF, 1) (sleeping...)
[...]
So I checked when it started to go wrong.
[...]
408: 2.6594 0.0007 0.0000 fcntl(4, F_SETFL, (no flags)) = 0
[...]
12196: 3.7245 0.0003 0.0003 forkx(0) = 12200
The connection started just before the fcntl showed above and everything is executing quick up-to
forkx() at 3.7245s. So far it took a little more than 1s. What happens next seems to be a loop of hundreds of entries like:
[...]
12200: 4.5521 0.0000 0.0000 ioctl(3, ZFS_IOC_USERSPACE_ONE, 0x08046790) Err#48 ENOTSUP
12200: 4.5522 0.0001 0.0000 ioctl(7, MNTIOC_GETMNTENT, 0x08047C1C) = 0
12200: 4.5917 0.0395 0.0002 ioctl(3, ZFS_IOC_OBJSET_STATS, 0x08046390) = 0
12200: 4.5918 0.0001 0.0000 getuid() = 35148 [35148]
12200: 4.5919 0.0001 0.0000 getuid() = 35148 [35148]
12200: 4.5919 0.0000 0.0000 door_info(6, 0x08046460) = 0
12200: target=189 proc=0x806FCD0 data=0xDEADBEED
12200: attributes=DOOR_UNREF|DOOR_NO_CANCEL
12200: uniquifier=289
12200: 4.5922 0.0003 0.0000 door_call(6, 0x080464D0) = 0
12200: data_ptr=FE430000 data_size=232
12200: desc_ptr=0x0 desc_num=0
12200: rbuf=0xFE430000 rsize=16384
12200: 4.5923 0.0001 0.0000 ioctl(3, ZFS_IOC_USERSPACE_ONE, 0x08046790) Err#48 ENOTSUP
12200: 4.5923 0.0000 0.0000 ioctl(7, MNTIOC_GETMNTENT, 0x08047C1C) = 0
12200: 4.6095 0.0172 0.0001 ioctl(3, ZFS_IOC_OBJSET_STATS, 0x08046390) = 0
12200: 4.6096 0.0001 0.0000 getuid() = 35148 [35148]
12200: 4.6096 0.0000 0.0000 getuid() = 35148 [35148]
12200: 4.6097 0.0001 0.0000 door_info(6, 0x08046460) = 0
12200: target=189 proc=0x806FCD0 data=0xDEADBEED
12200: attributes=DOOR_UNREF|DOOR_NO_CANCEL
12200: uniquifier=289
12200: 4.6098 0.0001 0.0000 door_call(6, 0x080464D0) = 0
12200: data_ptr=FE430000 data_size=232
12200: desc_ptr=0x0 desc_num=0
12200: rbuf=0xFE430000 rsize=16384
12200: 4.6098 0.0000 0.0000 ioctl(3, ZFS_IOC_USERSPACE_ONE, 0x08046790) Err#48 ENOTSUP
12200: 4.6099 0.0001 0.0000 ioctl(7, MNTIOC_GETMNTENT, 0x08047C1C) = 0
12200: 4.6201 0.0102 0.0001 ioctl(3, ZFS_IOC_OBJSET_STATS, 0x08046390) = 0
12200: 4.6202 0.0001 0.0000 getuid() = 35148 [35148]
12200: 4.6203 0.0001 0.0000 getuid() = 35148 [35148]
[...]
The process with PID 12200 was:
12200: 3.8229 0.0947 0.0013 execve("/usr/sbin/quota", 0x0811F9E8, 0x0811D008) argc = 1
12200: *** SUID: ruid/euid/suid = 35148 / 0 / 0 ***
12200: argv: /usr/sbin/quota
By visually looking at couple of pages of these ioctls it looked like most of the total time would be spent in doing
ioctl(3, ZFS_IOC_OBJSET_STATS, 0x08046390). Lets check it:
# grep "^12200:" /tmp/a |grep ioctl|grep ZFS_IOC_OBJSET_STATS|awk 'BEGIN{i=0}{i=i+$3}END{print i}'
9.7412
So out of 11s above ioctls along took 9. To have a clear picture lets check how much time the quota command took:
# grep "^12200:" /tmp/a |head -1
12200: 3.7245 3.7245 0.0000 forkx() (returning as child ...) = 12196
# grep "^12200:" /tmp/a |tail -1
12200: 13.9854 0.0003 0.0000 _exit(0)
So it took about 10s which means almost 100% of its time was spent doing above ioctls.
There are almost 300 zfs filesystems on this particular server so it all adds up. Sometimes quota completes very quickly sometimes it takes many seconds - I guess depending if requested data from all these zfs filesystems is cached or not. You need to run quota as a non-root user otherwise most checks are skipped and it is always quick.
Since we are not using quota on these systems anyway I commented out quota check in
/etc/profile and now a full login takes about 1s on average which is 10-12x improvement.
>
Read More... |
Digg This!
I'm especially interested in "support for OpenSolaris Boomer architecture" which hopefully means a working microphone in a guest on my laptop - that would mean a working Skype on VB/Windows :)
Version 3.1 will be a major update. The following major new features were added:
- Teleportation (aka live migration); migrate a live VM session from one machine to another
- VM states can now be restored from arbitrary snapshots instead of only the last one, and new snapshots can be taken from other snapshots as well (aka branched snapshots)
- 2D video acceleration for Windows guests; use the host video hardware for overlay stretching and colour conversion
- The network attachment type can be changed while a VM is running
- Experimental USB support for OpenSolaris hosts making use of the latest USB enhancements in Solaris Nevada 124 and higher.
- Significant performance improvements for PAE and AMD64 guests (VT-x and AMD-V only; normal (non-nested) paging)
- Experimental support for EFI (Extended Firmware Interface)
- VirtIO network device support
In addition, the following items were fixed and/or added:
- VMM: reduced IO-APIC overhead for 32 bits Windows NT/2000/XP/2003 guests; requires 64 bits support (VT-x only; bug #4392)
- VMM: fixed double timer interrupt delivery on old Linux kernels using IO-APIC (caused guest time to run at double speed; bug #3135)
- VMM: reinit VT-x and AMD-V after host suspend or hibernate; some BIOSes forget this (Windows hosts only; bug #5421)
- GUI: prevent starting a VM with a single mouse click (bug #2676)
- 3D support: major performance improvement in VBO processing
- 3D support: added GL_EXT_framebuffer_object, GL_EXT_compiled_vertex_array support
- 3D support: fix crashes in FarCry, SecondLife, Call of Duty, Unreal Tournament, Eve Online (bugs #2801, #2791)
- 3D support: fix graphics corruption in World of Warcraft (#2816)
- iSCSI: support iSCSI targets with more than 2TiB capacity
- VRDP: fixed occasional VRDP server crash (bug #5424)
- Network: fixed the E1000 emulation for QNX (and probably other) guests (bug #3206)
- Network: even if the virtual network cable was disconnected, some guests were able to send / receive packets (E1000; bug #5366)
- Network: even if the virtual network cable was disconnected, the PCNet card received some spurious packets which might confuse the guest (bug #4496)
- VMDK: fixed handling of split image variants
- VHD: fixed incompatibility with Hyper-V
- OVF: create manifest files on export and verify the content of an optional manifest file on import
- X11 based hosts: allow the user to specify their own scan code layout (bug #2302)
- Mac OS X hosts: don't auto show the menu and dock in fullscreen (#bug 4866)
- Solaris hosts: combined the kernel interface package into the VirtualBox main package
- Solaris hosts: support for OpenSolaris Boomer architecture (with OSS audio backend).
- Shared folders: fixed changing case of file names (bug #2520)
- Shared folders: VBOXSVR is visible in Network folder (bug #4842)
- Windows and Linux Additions: added balloon tip notifier if VirtualBox host version was updated and Additions are out of date
- Solaris Additions: fixed as_pagelock() failed errors affecting guest properties (bug #5337)
- Windows Additions: added automatic logon support for Windows Vista and Windows 7
- Windows Additions: fix crash in seamless mode (contributed by Huihong Luo)
- Linux Additions: added support for uninstalling the Linux Guest Additions (bug #4039)
- SDK: added object-oriented web service bindings for PHP5
>
Read More... |
Digg This!
PSARC/2009/557 ZFS send dedup was integrated yesterday. It allows to dedup zfs send|recv stream regardless if dedup is enabled or not on a sending and/or receiving side. Looks like it both pool and zfs send stream deduplication will be available in snv_128. Of course you can pull the sources and build them yourself if you want to start playing with dedupe now otherwise you will have to wait to the beginning of December when snv_128 should hit /dev Open Solaris repository.
>
Read More... |
Digg This!
Recently there was an article at OSNEWS "
Should ZFS Have a fsck Tool?". Well, no it shouldn't. I wanted to write an explanation why it is the case but Joerg was first and there is no point me repeating him. So if you wonder why ZFS doesn't need a fsck tool
read Joerg blog entry about it.
>
Read More... |
Digg This!
It took more than expected but it has been finally
integrated!
PSARC 2009/571 ZFS Deduplication Properties
6677093 zfs should have dedup capability
You can find code changes
here.
>
Read More... |
Digg This!
According to Jeff Bonwick (one of the main developers of ZFS)
the real reason is:
> Apple can currently just take the ZFS CDDL code and incorporate it
> (like they did with DTrace), but it may be that they wanted a "private
> license" from Sun (with appropriate technical support and
> indemnification), and the two entities couldn't come to mutually
> agreeable terms.
I cannot disclose details, but that is the essence of it.
Jeff
>
Read More... |
Digg This!
Yesterday a new zone brand
was integrated into Open Solaris: Solaris 10 Zones. It allows running Solaris 10 in a zone on Open Solaris. The technology will be very useful in some instances for enterprise customers once Solaris 11 is out or when deploying Open Solaris. This is very useful if you want to make use of latest hardware platforms and/or technologies delivered in Open Solaris (and Solaris 11 in the future) and you need to migrate your older environments running Solaris 8, 9 or 10 without introducing changes to them (to mitigate a risk or due to lack of resources to do a more detailed validation of a new environment). This is especially true with Open Solaris and Solaris 11 as they are so much different than older Solaris releases. Branded zones are a very neat solution to provide backward compatibility without putting too much hurdle on innovation or having most of the users who do not need it to pay the price of it.
>
Read More... |
Digg This!
The whole MySQL issue with Oracle and EU is just silly. I'm not going to repeat arguments here as there are many comments you can find and I believe that all that could be said about it has already been said. If you are interested in what is going on and why then read these two articles which summarize the whole issue very well.
Here is one from Groklaw and
this one is also very interesting.
It is really sad to see to what extend some people are greedy...
I think that RMS and Monty lost whatever credibility they had left. They definitely lost it completely in my eyes. Not that Monty cares as he is after more money here... He wants to cash-in two times... Call me idealist or whatever but I honestly believe that life and business is not *only* about money.
>
Read More... |
Digg This!
Somehow I'm not surprised that comments section is disabled in
this example of IBM's Black Magic. I won't comment on it myself as there is no point in repeating
Joerg arguments - I agree with him in 100%. This is about recent world record in
TPC-C benchmark from Oracle/Sun.
However I would like to suggest to Ms Stahl that there is another big advantage in IBM's setup she's missed - not only IBM did more tpmC/core but they also did:
- larger number or racks per tpmC
- larger tpmC/$
- larger power usage/tpmC
I think that somehow most customers would consider above three as more important than tpmC/core (assuming they would consider TPC-C relevant in the first place).
>
Read More... |
Digg This!