OSGalaxy

published on 2009-11-25 00:56:00 in the "Sun" category
Ben Rockwood

Just an update on the acquisition front... Oracle gets more time to respond to EC antitrust concerns. "The deadline for a final ruling has been put back to Jan. 27 from Jan. 19, which amounts to six additional working days for Oracle to win over the skeptical regulator."

It's become crystal clear, for those not following the issue, that this is really all about MySQL. As I and others have sited repeatedly, the de facto standard MySQL engine for enterprise deployments is InnoDB which is already owned by Oracle, which really puts a big dent in the argument. All this makes you wonder, would Oracle have still acquired Sun if they didn't own MySQL? I tend to think, yes. Which makes that deal seem all the more ridiculous. All the same, Sun paid $1B for it, so the suggestion that Oracle should just let that entity break back off is even more ridiculous, not to mention just bad business.



> Read More... | Digg This!

published on 2009-11-24 20:34:00 in the "OpenSolaris" category
Ben Rockwood

Nevada Build 128 (snv_128) is now closed and available as BFU or source tarball. This means that those who want to play with ZFS Dedup but don't want to build from source can give it a go.

It should be said that there have been a lot of exciting enhancements to Nevada over the last couple of builds. Here are some of the changes in the last couple builds:

  • ZFS Dedup
  • zpool recovery support
  • More ZFS fixes and improvements than you can shake a stick at
  • Solaris now has bridging, and RBridges (IETF TRILL)
  • Crossbow now provides link-protection (IP Anti-Spoof); this was a Joyent request we're glad to see incorporated
  • Flowadm now implements remote_port attribute (was in the man page since the beginning but only added in 126)
  • ksh93 update 2
  • Solaris Hotplug Framework
  • Smartcard support was ripped out
  • ILB: Integrated L3/L4 Load balancer ... yes, thats right, a L3/L4 Load Balancer integrated INTO the Solaris kernel! This is my play toy atm.
  • iSCSI Boot
  • Piles and piles of COMSTAR and FCOE enhancements
  • FMA for Nehalem_EX
  • Solaris 10 zones
  • Fast Crash Dump
  • Lots of Audio improvements
  • Clearview IP Tunneling (ie: create IP tunnels via dladm and associate resource controls like any other link)
  • Datalink Administration from Non-Global Zones
  • Solaris Packet Capture
  • Marvell Yukon Gigabit Ethernet Driver
  • ... and on and on and on.

If your not running at least Build 121 your really behind the times, and I highly recommend that if you have the time to install SX:CE 127 and BFU up to 128... or, if your busy with the holidays, make sure you set aside some time in December to really dig into the new hotness when SX:CE 128 releases.



> Read More... | Digg This!

published on 2009-11-20 21:41:00 in the "cuddletech" category
Ben Rockwood

Role models aren't something we have few of; sad that perhaps the most recent one comes from a beer commercial:

I mean, come on... his advice on careers "Find what you don't do well.... and don't do that thing." Classic!

Need something more expansive? Learn Chinese! If you find it difficult, try to learn Japanesse... and then you'll go back and appreciate how much easier Chinese languages are.

Not intellectual enough? Need to stretch those brain cells a bit more? Then, I ask, what is justice? As a Christian I have all those answers, laid down thousands of years ago, but since apparently folks like to re-invent the wheel (something King Solomon explained to us about 1,000 BC... "There is nothing new under the sun"), try Harvard's Michael Sandel discussion on Justice. A fun and engaging discussion in one of Harvard's beautiful facilities, exploring the "Moral Side of Murder". Its an enjoyable metal excersize and well expressed.

If your reading this post on an aggregator or via RSS and don't see the embedded video, just come here to cuddletech to see it properly.



> Read More... | Digg This!

published on 2009-11-19 18:57:00 in the "cuddletech" category
Ben Rockwood

Lots of folks have switched to Mac, its the most commonly used laptop in the Bay Area now. Sometimes people give me flack for using it, but I'll tell you why I use a Mac laptop:

  1. It just works! When going to a client site, a conference, or just a cafe, there is nothing more embarrassing than spending 20 minutes trying to get your l337 *NIX laptop to connect to wireless or properly DHCP or work with a printer. This isn't as big a problem as it once was but it can still happen. This is especially the case if you ever do a presentation where your fiddling with things in front of 30+ people. Mac's just work, period.
  2. The Apps are high quality! Thanks to the Linux desktop invasion we have a lot of great apps for *NIX; however Mac apps have a very high standard for quality, all work more or less similarly, and there are lots of great apps. The problem I have on Windows these days is that there aren't as many great apps for Windows as there are for OS X.
  3. Its UNIX! This is the most important fact for me, its a real desktop OS with a real UNIX underneath. I was a Mac hater prior to OS X, but developed a love affair with NeXT... when the two converged in OS X I was a happy camper indeed.
  4. The Apple Laptops are the best on the market! I can not find a PC Laptop with the same build quality and durability of the Apple's. Most PC's use cheap plastics, are too thick, too flimsy, etc. The MacBook Pro 15" Aluminum is what I still use and love. The size is absolutely perfect, the thing is solid, and very comfortable to use. The power adapters are even better. Even if I wanted a machine just to run Solaris on metal, I'd want a MacBook Pro over any PC laptop available. In terms of hardware you really do seem to get what you pay for.

    Now, please note that I do not have nor do I ever plan to have a Mac desktop! For my daily work I need a real UNIX Workstation. I prefer to work with Enlightenment, Eterm, and have a real Solaris system on which to work. Without my desktop I can't accomplish real work, but for the road I need my MacBook Pro.

    So here are some of my "must have apps" for OS X:

    • iTerm: It once was that OS X's terminal was pretty basic and pathetic, glTerm and iTerm filled the void. Since that time the default terminal application has improved significantly making iTerm unnecessary, but I continue to be faithful to it.
    • Adium: Adium is the best multi-protocol IM client available for Mac. While iChat AV is fantastic for voice and video "chat", I want to keep my desktop tidy which means I want IRC style chat in multiple tabs, not windows. I just can't stand having a real discussion in those iChat balloons.
    • NewsFire: Best RSS reader, imho. The primary advantage to Newsfire is that it doesn't make RSS look like email! Email feels like work, I just want to flip through RSS and see whats news. Newfire is free and really spiffy.
    • TrueCrypt: I'm not a really big crypto freak, I wish I were, but I'm lazy. Never the less, at some point you'll go on the road and Sysadmins are bound to have text files containing sensitive information. TrueCrypt makes it easy to create a small encrypted drives on which to store that data. Plus, the virtual drives it creates are cross-platform, so your not locked into only retrieving the data on Mac like other encrypting archive apps.
    • Things: I think its the best todo application available. Its light-weight and easy to use. OmniFocus is a much more structured application and I think is good for people who need rigorous structure to keep them honest, but Things can be made to do almost everything OmniFocus can do, if you choose to, or be used much more casually.
    • RealVNC: The most popular VNC Viewer application for OS X is "Chicken of the VNC". I love the name, love the icon, but a lot of times it doesn't work for me. RealVNC isn't so sexxy but works every time without a problem.
    • Colloquy: Great IRC application. Many *NIX folks will prefer a more traditional terminal based IRC client, but if your an Xchat users who's looking for a nicely integrated IRC client for OS X Colloquy is the best imho.
    • VirtualBox: Very powerful and free to boot. I use both VirtualBox and VMware Fusion. Honestly, VMware is slightly faster, but VirtualBox is still fantastic and the additional portability is handy.
    • Apache Directory Studio: If there is one nifty app the Windows boys have its Softerra LDAP Administrator. Apache Directory Studio is the best alternative I've seen, and I think will ultimately surpass Softerra's capabilities.
    • iShowU: Best screen recording app period. Very easy to use, very flexable and lightweight. When creating screencasts I recommend using the Quicktime Animation CODEC; you'll be happy with it.
    • globalSAN iSCSI initiator for OS X: Its sad that even in Snow Leopard we don't have an Apple supplied iSCSI Initiator, but thankfully globalSAN has us covered. Its free and works very well with COMSTAR.
    • Cornerstone: I didn't think Subversion needed a GUI... but Zennaware Cornerstone changed my mind. Its expensive, but if you do a lot of SVN work you won't want to miss it.

    I'll add some more to the honorable mention list...

    • Textmate
    • iWork '09
    • iLife '09
    • Skitch
    • iStumbler
    • Netbeans
    • Navicat Lite
    • OmniGraffle
    • ...

    On the hardware side, every UNIX Admin must be able to access an RS-232 serial console. This fact kept me away from Mac laptops for a long time. Which is why you need this:

    The Keyspan Serial-USB Adapter. Buy one, download the Keyspan Assistant software and install Zterm. Good to go!

    Finally let me point out 2 things which are already in Leopard that you may not be aware of:

    First, with the OS on the Install disk is the Apple Xcode IDE. Along with Xcode is the koolest GUI for DTrace you'll ever see: Instruments Its really amazingly awesome and a must see.

    Secondly, OS X includes native Kerberos support and a ticket management GUI which is sort of buried: /System/Library/CoreServices/Kerberos. If you use Kerberos at all drag that binary onto your doc for quick access. Several other hidden gems can be found in the same directory.



    > Read More... | Digg This!

published on 2009-11-17 22:08:00 in the "cuddletech" category
Ben Rockwood

I'm really pleased to announce that Intel Capital has invests in Joyent.

This is a really exciting thing for us. This is the first time we've taken funding. We've really been proud of the fact that we haven't needed funding, but the benefits that come along with an investment from Intel are fantastic and just that relationship alone is exciting.

This is a big announcement not only for Joyent, but for OpenSolaris as well. We're thrilled that Intel supports not only what we're doing, but also how we're doing it. Combined with our recent expansion into China, we have a lot to be happy about.



> Read More... | Digg This!

published on 2009-11-17 21:54:00 in the "cuddletech" category
Ben Rockwood

My talk at LISA is now available. This is a 1 hour version of the ZFS in the Trenches talk. As always I hope that you find it informative and at least a little entertaining. Slides are here).

I also want to take this opportunity to say a heart felt thank you to Deirdré Straughan, Lynn Rohrer, and Teresa Giacomini.

Because of Deirdre countless people around the globe can participate and learn from important events. Not only does she spend a mind-boggling amount of time going to these events, but she has done a fantastic job producing very high quality content, and I think is setting the bar in community video presentation. We just don't get this kind of content from other top tier vendors and I really hope they take notice of her efforts and the benefit to Sun's current and prospective customer bases.

So please join me in extending your support and appreciation to Deirdre and everyone at Sun that makes these events accessible to the whole world!



> Read More... | Digg This!

published on 2009-11-13 02:47:00 in the "cuddletech" category
Ben Rockwood

For sometime now I've gone back and forth on what is my personally preferred (LDAP) directory server; in particular between Sun Directory Server Enterprise Edition, OpenDS, and OpenLDAP. Each has advantages and trade-offs:

  • DSEE: Not free, complex, but well trusted, exceptional scalability
  • OpenDS: Free, super simple install and management GUI included, best starter directory for sure, but relatively new to the scene and thus needs to build more cred.
  • OpenLDAP: Not the best scalability, not the best replication or feature list, but very extensible, extremely well known and supported, free. Advanced features much more straight forward than competitors due to flat config file (especially ACLs, TLS, etc)

So I put it to my loyal and educated readers... which is your directory of choice?



> Read More... | Digg This!

published on 2009-11-10 09:57:00 in the "OpenSolaris" category
Ben Rockwood

ZFS Deduplication was recently putback (Sun terminology for "commit") to ON (Solaris's primary codebase). That means it should go out at snv_128 (Build 128) due later this week.

Unable to wait for the BFU archives I resorted to actually building the code myself to play; something I've not felt the burning need to do for at least 2 years (I'll blog about that shortly). Here's the initial review...

In typical fashion putting ZFS Dedup to work is a trivial task. Zpools are created in the normal way, the dedup feature is enabled on a per-dataset basis and therefore is a simple matter of turning it on:

root@quadra ~$ zpool create stick c4t0d0
root@quadra ~$ zpool get all stick
NAME   PROPERTY       VALUE       SOURCE
stick  size           3.75G       -
stick  capacity       0%          -
stick  altroot        -           default
stick  health         ONLINE      -
stick  guid           12142487970365036186  default
stick  version        21          default
stick  bootfs         -           default
stick  delegation     on          default
stick  autoreplace    off         default
stick  cachefile      -           default
stick  failmode       wait        default
stick  listsnapshots  off         default
stick  autoexpand     off         default
stick  dedupratio     1.00x       -
stick  free           3.75G       -
stick  allocated      76.5K       -

Notice that there is no option to enable dedup for the pool, however there is a read-only "dedupratio" key. Because ZFS properties are inherited by child datasets we'll enable dedup on the root dataset, in this case "stick":

root@quadra ~$ zfs set dedup=on stick

Done! That's it. Really, you're done! Stop reading this now. :)

... ok, maybe I'll go into it a bit more.

As with many ZFS Dataset Properties, there can be more than one setting. The default value of the "dedup" properties is "off". It can also be set to "on", "sha256", "verify", or "fletcher4,verify". "on" is simply a pseudonym for "sha256". "verify" is a pseudonym for "sha256,verify" and enables an ability to detect and correct hash collisions, however this is very system intensive and is not recommended for casual use, if you require absolute integrity at all costs, go for it, but test your workload first. Phrases like "hash collision" can cause a panic, but remember that the odds are astronomical. For details on this see Jeff Bonwick's post on ZFS Dedup.

So, now for some testing. I've created my "stick" pool on a new 4GB micro-USB stick and enabled dedup. Lets copy in a bunch of JPEG's to several directories and see what happens:

root@quadra ~$ zfs list stick
NAME    USED  AVAIL  REFER  MOUNTPOINT
stick  73.5K  3.69G    21K  /stick
root@quadra ~$ mkdir /stick/userA
root@quadra ~$ mkdir /stick/userB
root@quadra ~$ mkdir /stick/userC
root@quadra ~$ cd img
root@quadra img$ time cp * /stick/userA
real    0m15.395s
user    0m0.005s
sys     0m0.174s
root@quadra img$ time cp * /stick/userB
real    0m15.952s
user    0m0.004s
sys     0m0.112s
root@quadra img$ time cp * /stick/userC
real    0m2.347s
user    0m0.004s
sys     0m0.125s

root@quadra img$ zfs list stick
NAME    USED  AVAIL  REFER  MOUNTPOINT
stick   203M  3.62G   203M  /stick

root@quadra img$ cd /stick/userA/
root@quadra userA$ du -sh .
74M     .

OK, notice that I'm copying in 74MB of data, 3 times, each to a different directory. (Its slow because its a crappy USB stick.) If we run du it registers the proper size, if we look at zfs list it shows the full size of 203MB. In fact, if I look at the dataset properties I have no indication at all of its on-disk size:

root@quadra userA$ zfs get all stick
NAME   PROPERTY              VALUE                  SOURCE
stick  type                  filesystem             -
stick  creation              Tue Nov 10  0:07 2009  -
stick  used                  220M                   -
stick  available             3.62G                  -
stick  referenced            220M                   -
stick  compressratio         1.00x                  -
stick  mounted               yes                    -
stick  quota                 none                   default
stick  reservation           none                   default
...

So here's the magic... look at the pool size:

root@quadra ~$ zpool list stick
NAME    SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
stick  3.75G  72.5M  3.68G     1%  3.06x  ONLINE  -

Beautiful. 72.5MB allocated and we correctly see the dedup ratio of 3 (less than the file sizes, leading me to believe there are some duplicate images, which I don't doubt).

Yet again, ZFS makes it "just work". And you don't need a big huge expensive peice of gear, I'm deduping on this:

Suck it Data Domain. :)

For the elite ZFS Internals hackers out there, you can get a closer look at dedup using zdb -S (thanks to Jeff Victor for the tip):

root@quadra ~$ zdb -S stick
Simulated DDT histogram:

bucket              allocated                       referenced          
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
     2      623   69.9M   69.9M   69.9M    1.83K    210M    210M    210M
     4       14   1.63M   1.63M   1.63M       84    9.8M    9.8M    9.8M
 Total      637   71.5M   71.5M   71.5M    1.91K    219M    219M    219M

dedup = 3.07, compress = 1.00, copies = 1.00, dedup * compress / copies = 3.07

So now its time to really beat on this thing and see if and where it breaks. Dedup for the masses is coming in the mail!!!



> Read More... | Digg This!

published on 2009-11-09 22:44:00 in the "cuddletech" category
Ben Rockwood
benr@quadra ~$ zpool list
NAME     SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
quadra   928G   518G   410G    55%  1.00x  ONLINE  -

Shiny. ;)

More to come soon....



> Read More... | Digg This!

published on 2009-11-08 07:09:00 in the "cuddletech" category
Ben Rockwood

One of the many facinating things I discovered at LISA was that almost no one has heard of (no one at my sessions at LISA anyway) Dr. G: Medical Examiner

Jan C. Garavaglia, M.D., (aka "Dr. G") is the chief medical examiner for the District Nine (Orange-Osceola) Medical Examiner's Office in Florida. An assortment of her cases are strung together to create the weekly show on Discovery Health "Dr. G: Medical Examiner."

Another...

I started watching the show because Tamarah is a Discovery Health channel junky. She loves the medical detective shows such as Dr. G and Mystery Diagnosis. I am particularly drawn to the show when I do a lot of postmortem work on systems (aka: "core dump analysis"). Medical practice is a great model for how to approach problems systematically and to follow the story to its conclusion. I suspect many geeks (at least those who don't pontificate about not owning a TV) would also enjoy it.



> Read More... | Digg This!

published on 2009-11-01 04:51:00 in the "Solaris" category
Ben Rockwood

Here's something in the category of "things that makes you go wha?!?": The OpenSolaris Security Summit has been renamed to simply "Solaris" Security Summit.

If we've been looking for the first shot fired at OpenSolaris this would seem to be it. The question is whats next? When you combine this with the recent resurrection of "Solaris Next" (aka: Solaris 10++) it starts suggesting something is in the works, undoubtedly Oracle orchestrated.

Now, at this point I'm not jumping to any conclusions, and I don't think you should either. Oracle's intentions seem fairly clear at the moment and entirely positive for the future of Solaris and SPARC; and we know that X86 is also a part of that vision. Turning some love away from the OpenSolaris distro towards Solaris will be a welcome change for large enterprise customers, and undoubtedly a motivating factor.

My advise is to watch and wait... the wheels are turning.

If folks from Oracle/Sun are reading this; do what you wish with the Solaris product roadmap, but the community and source for Solaris are a critical part of a successful future. Please feel free to reassure us that we won't lose that. I personally rely on access to the source for problem analysis and research on a daily basis and having access to Solaris developers, both badged and unbadged, is something I never want to be without again.



> Read More... | Digg This!

published on 2009-10-30 18:38:00 in the "OpenSolaris" category
Ben Rockwood

If you were at or watched the events from CommunityOne this year you saw some nifty demos of Crossbow's "Vwire" capabilities through a graphical demo tool. Today that Virtual Wire Demo Tool is available for download!

Now, personally, I'm not a fan of the tool. GUI's such as this are useful for demonstrating complex system utilities in an executive friendly way... but Crossbow is so easy to use it needs no pretty GUI, using dladm create-vnic and dladm create-etherstub its so simplistic to setup that I fail to see the point.

Never the less! If you are having trouble making a case for the awesome power of Crossbow to your slavemasters this may be just the tool to help you get the message across.



> Read More... | Digg This!

published on 2009-10-29 22:13:00 in the "cuddletech" category
Ben Rockwood

News is popping up that will interest those interested in the Sun/Oracle deal.

I've made peace with Jonathan Schwartz but those who haven't will no doubt love to bash on his pay. The first data I've seen in a while comes fresh from the AP: Sun CEO's pay package cut by a third in '09. According to the article, his 2008FY compensation was $11.1 Million, and it looks like 2009FY will come in at only $7 Million, information which came from the Sun Proxy filing with the SEC on Wednesday. One thing I have always wondered is what his personal driver costs... apparently he is company provided and costs over $45,000 per year (figured by the $55,000 spend on both driver and 401K match).

So, hey, even guys who make millions of dollars per year try to max out their 401K.... what does that tell ya? :)

Based on the same proxy filings, El Reg reports yet more on the compensation front. They report that Scott McNealy owns approx 2.3% of Sun. They estimate that if he excercises 3.1M in options by the end of Dec his cut will be $164.5M. Going on, El Reg reports that Jonathan has almost 1.5M options and 592K shares... so he comes away with $19.8M.

The Reg also counts up the total number of layoffs in the past 12 months at 8,000 (I assume that includes the 3,000 currently being chopped).

We got some nice news from Oracle this week by way of a FAQ: Oracle and Sun Overview and FAQ (Dated October 27, 2009). My questions regarding X86 and Solaris were included:

 What are Oracle’s plans for Solaris?

Oracle plans to spend more money developing Solaris than Sun does now. The
industry leading capabilities of the Solaris operating system make it the leader
in performance, scalability, reliability, and security – all of which are core
requirements for our customers. Oracle plans to enhance our investment in
Solaris to push core technologies to the next level as quickly as possible. Today
there are more applications available on Solaris than any other operating system
in the world. In addition, the combination of Oracle and Sun engineering
teams in database and operating system open up a new set of opportunities
to create exciting innovations for customers with respect to performance,
operational efficiency, security, and cost of ownership.

 What are Oracle's plans for x86?

The extremely broad and volume use of x86 makes it an important
building block for servers as well as other parts of the combined Oracle
and Sun portfolio. We plan to continue to engineer server and appliance
products based on x86. In addition, x86 is of course a key element of both
Sun and Oracle's software portfolio, with Solaris and Oracle Enterprise
Linux as well as all of the software of both companies robustly sold and
supported in the x86 marketplace.

So this fits perfectly in line with what we've heard to date, namely that Solaris rules and X86 is a critical offering as part of other offerings.

Finally, the Financial Times is reporting that Russian Anti-Trust is making life rough and FT perhaps foolishly plays up the headline by asking "is the deal about to unravel?" Read it for yourself but I'm not jumping to any conclusions.

Will this never end? Despite Oracle's pledged $9.50 per share, JAVA has dropped to $8.27 today, suggesting a lack of confidence. And I think most of us in the various communities have come to terms with the prospect of Oracle and are ready for things to get moving. There is a lot to suggest that Oracle is already calling the shots at Sun to various degrees, as we saw at Oracle OpenWorld recently. Besides that, at this point Sun is damaged beyond hope of repair... if this deal doesn't close soon we're all going to be in a world of hurt.

Lets get this deal done! Give the execs their money so they can retire and stop f***ing the company, and lets go kill IBM.



> Read More... | Digg This!

published on 2009-10-23 17:51:00 in the "cuddletech" category
Ben Rockwood

Little hobby electronics company SparkFun Electronics just got a cease-and-decist from SPARC International because "SparkFun" may be confused by consumers as being associated with the SPARC trademarks.

Come on guys.... lets be level headed. I think its a clever branding and they are in no way confused with SPARC processors or any of the companies that are members of SI.



> Read More... | Digg This!

published on 2009-10-21 06:36:00 in the "cuddletech" category
Ben Rockwood

Recently we talked about Solaris Auditing (BSM) in the Real World. Like BSM, Extended Accounting is a fantastic feature of Solaris that is utterly useless without tools. Solaris goes so far as giving you the capability but not so far as to hand you the rest of the solution on a silver platter. On one hand this means that the technology isn't pigeon holed due to the capabilities of a single tool, but at the same time it creates a barrier to entry that causes many people to simply ignore it all together. So, yet again, let me provide a simple tool to fill some of that void.

In a previous post, Solaris Extended Accounting, I described Extended Accounting and provided two scripts to get you started, one was a PERL script to dump Extended Accounting ("exacct") data files and the other was called "prettyproc" which output Proccess Accounting files in a more human friendly way. This post should be viewed as Part 2 of that post.

When & How to use Extended Accounting

The most basic explanation of Extended Accounting is this: a facility that records certain events upon completion for later analysis. Those certain events depend on which of the four accounting types we're using. For processes, the cumulative data maintained by Solaris microstate accounting is written into a single record as process termination. For tasks, which are groups of processes within a single project, the same applies but recorded on each task termination rather than process. For (Crossbow) net, aggregate network utilization is written out on regular intervals (15 seconds). We'll ignore IPQoS "flow" Accounting entirely for the time being.

So the first thing we should say is that Extended Accounting is not a monitoring facility. If you want to know how much CPU or Memory is being used at some given time you should rely upon Kstats or /proc statistics on a polling schedule.

What Extended Accounting is good for is reporting. Consider 'net' accounting; every 15 seconds a record is created for each data link (dladm show-link). You could easily create a report at some interval (hour, day, week, month?) for both total bytes/packets sent/recieved on each link or great a graph or perhaps most likely calculate 95th percentile on the links. Now, in this case of 'net' accounting you could also use an external system to poll the data remotely via SNMP or locally via kstats, but this might serve as a better "definiative" local record.

Proc accounting is fuzzy ground though. The best way I can explain process accounting is to imagine that every time you executed a command Solaris was secretly running "time(1M)" and then storing the output on your behalf.

benr@quadra Downloads$ time tar xfj flash_player_10_solaris_x86.tar.bz2 

real    0m0.763s
user    0m0.705s
sys     0m0.070s

This is, essentially, whats happening! Solaris maintains a lot of detail on what processes are doing (known as "microstate accounting"). Normally, when a process terminates that data is simply discarded, however if Process Extended Accounting is enabled its dumped out as a record! From this record we can see interesting stats such as when the process started, when it finished (real time), how long it spent cpu time in kernel-land (sys time), how long it spent cpu time in user-land (user time), how many context switches it made, how much swapping it did, what its average RSS memory usage was, etc, etc, etc.

But as wonderful as this is, I have to make it crystal clear that this data isn't written out untill a process terminates! If MySQL runs for 4 months, it outputs a single record when it was finally shut down, and that record is the accumulation of that full 4 months of running!

Here is the exception. Proc and Task records can be "full" or "partial". When a process/task terminates and creates a record, that's a "full record". However, using "wracct" we can force a process or task to create a "partial record", which is essentially a way of saying "Just tell me what you've got so far!" The rub is that, in the proc case, that data is cumulative, so if you wanted to report on what a process has done in the last 24 hours you need to write a partial record every 24 hours and the find the difference between the partial record yesterday and partial record today. Talk about fun.

Now, besides all that, who actually bills users or reports usage based on total CPU time? Total context switches? This isn't the 1970's nor is this likely to be a Super Computer reporting computational time. In short, the data probably isn't terribly useful as a basis for billing in this day and age without some creative thought.

So then lets think... what can we determine from the data. Based on CPU usage we could determine what the top 5 CPU consuming processes were. Based on average RSS usage we could determine what the top memory consumers were. So on and so forth. Interesting perhaps... but worth it?

Go back to what I said about running "time" on every command. This data could be of used for capacity planning or, with some intelligence, behavior monitoring. Are your users complaining about commands taking too long to run, but when you ask how long they give you a bogus number or simply shrug? Extended Accounting can tell you. Are batch jobs running at night but want a record of when they started and how resource hungry they were? Here is a way that doesn't involve writing wrappers!

In short, Extended Accounting is a pretty lousy billing system on todays mulit-core systems, but it can provide useful historical statistics to questions that might be otherwise difficult to answer.

Practical Tools

The first tool I'll provide you with is a PERL replacement for the Solaris included /usr/demo/libexacct/exdump.c: exdebug.pl. This tool offers the following advantages:

  1. exdump.c hasn't been updated for the new Crossbow provided 'net' accounting data; exdebug.pl is module agnostic and works with them all.
  2. The output is just much cleaner and intuitive for exploring what ExAcct can do for you.
  3. Its implemented in PERL making it easier to get in there and build something, rather than dealing with the libexacct learning curve in C. If nothing else you can quickly prototype and then re-implement in C.

Here is an example from an 'net' record:

benr@quadra exacct$ acctadm net
            Net accounting: active
       Net accounting file: /var/adm/exacct/net
     Tracked net resources: extended
   Untracked net resources: none
benr@quadra exacct$ pfexec ./exdebug.pl /var/adm/exacct/net | more
Creator:  SunOS
Hostname: quadra

---------------- OBJECT 0 -----------------------
Object is: EO_GROUP   -   Catalog: EXT_GROUP EXC_DEFAULT EXD_GROUP_NET_LINK_DESC
                Id: EXD_NET_DESC_NAME   Value: testzone0
                Id: EXD_NET_DESC_EHOST  Value: 
                Id: EXD_NET_DESC_EDEST  Value: 
                Id: EXD_NET_DESC_VLAN_TPID      Value: 0
                Id: EXD_NET_DESC_VLAN_TCI       Value: 0
                Id: EXD_NET_DESC_SAP    Value: 0
                Id: EXD_NET_DESC_PRIORITY       Value: 0
                Id: EXD_NET_DESC_BWLIMIT        Value: 0
                Id: EXD_NET_DESC_DEVNAME        Value: testzone0
                Id: EXD_NET_DESC_V4SADDR        Value: 0
                Id: EXD_NET_DESC_V4DADDR        Value: 0
                Id: EXD_NET_DESC_SPORT  Value: 0
                Id: EXD_NET_DESC_DPORT  Value: 0
                Id: EXD_NET_DESC_PROTOCOL       Value: 0
                Id: EXD_NET_DESC_DSFIELD        Value: 0
...
---------------- OBJECT 67 -----------------------
Object is: EO_GROUP   -   Catalog: EXT_GROUP EXC_DEFAULT EXD_GROUP_NET_LINK_STATS
                Id: EXD_NET_STATS_NAME  Value: e1000g1
                Id: EXD_NET_STATS_CURTIME       Value: 1256033841
                Id: EXD_NET_STATS_IBYTES        Value: 2411692067
                Id: EXD_NET_STATS_OBYTES        Value: 202604900
                Id: EXD_NET_STATS_IPKTS         Value: 2005669
                Id: EXD_NET_STATS_OPKTS         Value: 1265178
                Id: EXD_NET_STATS_IERRPKTS      Value: 0
                Id: EXD_NET_STATS_OERRPKTS      Value: 0
---------------- OBJECT 68 -----------------------
Object is: EO_GROUP   -   Catalog: EXT_GROUP EXC_DEFAULT EXD_GROUP_NET_FLOW_STATS
                Id: EXD_NET_STATS_NAME  Value: inbound_ssh
                Id: EXD_NET_STATS_CURTIME       Value: 1256033841
                Id: EXD_NET_STATS_IBYTES        Value: 93958770
                Id: EXD_NET_STATS_OBYTES        Value: 106077944
                Id: EXD_NET_STATS_IPKTS         Value: 238395
                Id: EXD_NET_STATS_OPKTS         Value: 321977
                Id: EXD_NET_STATS_IERRPKTS      Value: 0
                Id: EXD_NET_STATS_OERRPKTS      Value: 0
---------------- OBJECT 69 -----------------------
Object is: EO_GROUP   -   Catalog: EXT_GROUP EXC_DEFAULT EXD_GROUP_NET_LINK_STATS
                Id: EXD_NET_STATS_NAME  Value: testzone0
                Id: EXD_NET_STATS_CURTIME       Value: 1256033861
                Id: EXD_NET_STATS_IBYTES        Value: 4528169
                Id: EXD_NET_STATS_OBYTES        Value: 0
                Id: EXD_NET_STATS_IPKTS         Value: 64405
                Id: EXD_NET_STATS_OPKTS         Value: 0
                Id: EXD_NET_STATS_IERRPKTS      Value: 0
                Id: EXD_NET_STATS_OERRPKTS      Value: 0

In the above example you'll see the variety of objects offered by the net accounting module, including link descriptions, link statistics ('testzone0' is a VNIC and 'e1000g1' is a physical interface), and flow statistics (inbound_ssh is a flowadm defined flow).

The second tool is exacctly, a human friendly Proc Extended Accounting dumper. It is also implemented in PERL and in fact was derived from the exdebug app above.

benr@quadra exacct$ acctadm proc
         Process accounting: active
    Process accounting file: /var/adm/exacct/proc
  Tracked process resources: extended
Untracked process resources: host
benr@quadra exacct$ pfexec ./exacctly /var/adm/exacct/proc | more
Creator:  SunOS
Hostname: quadra

      ZONE    UID    GID    PID                  CMD |   Real   User        Sys |               Start Date |    RSS AVG      RSS MAX     SysCalls      Swaps 
 ----------------------------------------------------+--------------------------+--------------------------+--------------------------------------------------
    global      0      0   1922              acctadm |   0.07   0.00       0.01 | Tue Oct 20 03:10:01 2009 |        524 K      12904 K        450          0 | FULL
    global      0      0   1920                   sh |   0.07   0.00       0.00 | Tue Oct 20 03:10:01 2009 |       2036 K      12904 K        103          0 | FULL
    global     25     25   1924             sendmail |   0.10   0.01       0.01 | Tue Oct 20 03:10:01 2009 |       1912 K      12904 K        543          0 | FULL
    global      0      0   1927             sendmail |   0.01   0.00       0.01 | Tue Oct 20 03:10:01 2009 |       2288 K      13172 K        267          0 | FULL
    global      0      0   1923                 mail |   0.10   0.00       0.00 | Tue Oct 20 03:10:01 2009 |        504 K      12904 K        169          0 | FULL
    global      0      0   1921                   sh |   0.11   0.00       0.00 | Tue Oct 20 03:10:01 2009 |        920 K      12904 K        102          0 | FULL

The output is really wide, but everyone should have a big ol' screen these days. Notice the depth of information here. For each terminated process we see the zone it was in, user and group, PID and command name itself (ExAcct doesn't record arguments), then we see real/sys/user time in seconds (ExAcct actually has nanosecond granularity, so these are rounded numbers), the start time and other goodness. The last column reports whether the record is full or partial.

This tool is, in and of itself, useful for many administrators to start using Extended Accounting that might otherwise have ignored it. Even more so, I hope it sparks your interest and imagination as to the possibilities! Just think of all the ways to amaze your boss and fellow admins!

Data File Rotation

Like any log, don't be lazy and forget to rotate those files or you'll have a mess on your hands. Rotating your extended accounting data files will make them easier to dissect and consume less disk. Here are some examples lines you can drop into /etc/logadm.conf, Solaris's default log rotation tool:

/var/adm/exacct/proc -N -p 1d -C 7 -b '/usr/sbin/acctadm -x process' -a '/usr/sbin/acctadm -e extended -f /var/adm/exacct/proc process'
/var/adm/exacct/net -N -p 1d -C 7 -b '/usr/sbin/acctadm -x net' -a '/usr/sbin/acctadm -e extended -f /var/adm/exacct/net net'
/var/adm/exacct/task -N -p 1d -C 7 -b '/usr/sbin/acctadm -x task' -a '/usr/sbin/acctadm -e extended -f /var/adm/exacct/task task'

These examples will rotate each day (-p 1d) and keep 7 logs (-C 7) before destroying. The important bit is that you can't just mv the file, you need to stop accounting, rotate, then resume it.

Remember to ensure that logadm isn't commented out in the root crontab.

Parting Thoughts & Cautions

Before I wrap up, I want to note something about Process records. Here is one as seen with exdebug:

---------------- OBJECT 0 -----------------------
Object is: EO_GROUP   -   Catalog: EXT_GROUP EXC_DEFAULT EXD_GROUP_PROC
                Id: EXD_PROC_PID        Value: 1922
                Id: EXD_PROC_UID        Value: 0
                Id: EXD_PROC_GID        Value: 0
                Id: EXD_PROC_PROJID     Value: 1
                Id: EXD_PROC_TASKID     Value: 39949
                Id: EXD_PROC_CPU_USER_SEC       Value: 0
                Id: EXD_PROC_CPU_USER_NSEC      Value: 2047013
                Id: EXD_PROC_CPU_SYS_SEC        Value: 0
                Id: EXD_PROC_CPU_SYS_NSEC       Value: 6237135
                Id: EXD_PROC_START_SEC  Value: 1256033401
                Id: EXD_PROC_START_NSEC         Value: 311640743
                Id: EXD_PROC_FINISH_SEC         Value: 1256033401
                Id: EXD_PROC_FINISH_NSEC        Value: 380283918
                Id: EXD_PROC_COMMAND    Value: acctadm
                Id: EXD_PROC_TTY_MAJOR  Value: 4294967295
                Id: EXD_PROC_TTY_MINOR  Value: 4294967295
                Id: EXD_PROC_FAULTS_MAJOR       Value: 0
                Id: EXD_PROC_FAULTS_MINOR       Value: 0
                Id: EXD_PROC_MESSAGES_SND       Value: 0
                Id: EXD_PROC_MESSAGES_RCV       Value: 0
                Id: EXD_PROC_BLOCKS_IN  Value: 0
                Id: EXD_PROC_BLOCKS_OUT         Value: 0
                Id: EXD_PROC_CHARS_RDWR         Value: 20100
                Id: EXD_PROC_CONTEXT_VOL        Value: 102
                Id: EXD_PROC_CONTEXT_INV        Value: 0
                Id: EXD_PROC_SIGNALS    Value: 0
                Id: EXD_PROC_SWAPS      Value: 0
                Id: EXD_PROC_SYSCALLS   Value: 450
                Id: EXD_PROC_ACCT_FLAGS         Value: 2
                Id: EXD_PROC_ANCPID     Value: 1920
                Id: EXD_PROC_WAIT_STATUS        Value: 0
                Id: EXD_PROC_ZONENAME   Value: global
                Id: EXD_PROC_MEM_RSS_AVG_K      Value: 524
                Id: EXD_PROC_MEM_RSS_MAX_K      Value: 12904

Okey, lots of data, lots of goodness. Notice EXD_PROC_BLOCKS_IN, OUT, and CHARS_RDWR? They are useless. I can't go into why here, but don't get excited about them or bother doing anything, the values are crap. If your a veteran Kstat diver you'll recognize similar values in the Kstat cpu_stat class... same story.

Hopefully this post as helped provide you with a more practical understanding of Extended Accounting and provided you with some resources to get in there and use the data. There is a wealth of possibilities if you just avail yourself of them. :)



> Read More... | Digg This!