Thursday, July 11, 2013

In memoriam

In early 2001 I was looking for a new job. My prospects weren’t stellar -- I was a foreign worker with a funny name looking for an IT position during the worst dregs of the dot-com crash, and my resume only had one Programming-Slash-Admin job on it. It’s the kind of resume that hiring managers quickly file in the “if_absolutely_desperate” folder.

So it wasn’t with any particular hope that I responded to a posting for a sysadmin position at Duke University Physics. I mean, it sounded awesome, but I felt it was a very long shot. To my surprise, I received an email back that same evening. The guy’s name was Seth Vidal, and he said he wanted to chat.

“You do realize I’m an H1B, right?” I said. “Getting me hired will be pain.”
“Duke’s pain, not mine,” he emailed back. “I searched your name and I liked that you’re active on open-source mailing lists.”

This was Seth's most common expression, honest.
So I drove to Durham for an interview. Seth cringed a bit at my then-preference for Slackware and the fact that I liked to build software from source. He did agree that RPM dependencies were hell, though, and fully shared my view that telnet was evil (ssh was still making inroads at the time). He showed me the server room, which was really a converted mens’ shower, containing a few old desks with assorted beige towers on them. There was a utility rack at the back of the room, the kind you buy at a hardware store for your garage, with more beige towers. Many had their sides off for better cooling. It was hot in there, even though the thermostat was cranked to the lowest setting.

“I’m working to replace many of these with some more respectable hardware,” he said. “But I have to stretch the budget right now, and you’ll be amazed at how well Linux runs on this commodity stuff. Heck, we may even get a server room one day that has real racks and doesn’t have water valves right above the file servers.”

I struggled a lot during my first 6 months. Seth was a brilliant sysadmin with a mind-boggling ability to quickly figure out the root cause of this or that problem. Most of the time, I just felt like I was apprenticed to an irascible wizard -- who was nevertheless amazingly tolerant of his pupil’s blunders. Oh, he got grumpy with me -- a lot, actually, and often for very good reasons, in retrospect. I did do stupid things sometimes. Still do. But his scolding was always followed by coaching, and for that I will remain forever grateful. I’ve learned so much from him in the 4 years when we worked closely together -- and it went beyond mere tech skills.

Everyone knows of Seth’s contribution to solving the famed “RPM dependency hell” -- he took some core ideas from YUP, the package manager used by Yellowdog Linux, and rewrote it from scratch, giving us YUM -- a tool sysadmins use on a daily basis. Fewer know that Seth was also instrumental in getting CentOS and Fedora Extras off the ground. I remember when it was all hosted in his office -- a half-rack on wheels that had a 64-bit Dell that was used to build the entirety of FC1 extras, plus an old G4 lying on its side that built PPC packages (those that would build). Another ex-workstation sitting on a shelf behind the half-rack hosted the website. All the exhaust fans made Seth’s office a really loud place to work from, and I knew it gave him headaches, but he left the rack there anyway -- our server room was over its cooling capacity and he really wanted Fedora to grow and succeed. Little wonder that Red Hat eventually snatched him up to make it his full-time job.

If I were to think of Seth’s one core principle -- which was both his fault and the damned best thing about him -- that would be his genuineness. Anyone who’s ever participated in a mailing list discussion with him knows his famously curt manner and his complete lack of desire to entertain (what he deemed were) foolish ideas. He spoke his heart and his mind, and he was widely known for it. But Seth also cared -- passionately, genuinely cared. He cared about his friends and about his family. He cared about his dog and about his co-workers (pretty sure it went in that order, but Cori is a very cute dog). He cared about his projects and about his ideals. When he asked you “hey, you doing okay?” that wasn’t because of some feeling of social obligation. He really wanted to know if you were okay.

His friends, in return, genuinely cared about him -- even if they secretly told each other that they needed a “Seth break” every now and again.

I left Duke in 2005, exchanging the stars-and-stripes for the maple leaf. While we remained in close touch, sharing IRC channels just wasn’t the same as working together under one roof. There were times when we didn’t have an occasion to chat for months. There were times when we did it every day. I last saw him when I was in Raleigh in November last year, to participate in Red Hat-sponsored Fedora Activity Day. He looked older, but he was the same old Seth. We chatted, we laughed at old memories, and we worked side-by-side well into the night.

“You know,” he said, looking up from his laptop. It was past 11 PM and the cleaning crew was wondering what we were all still doing there. “You really suck at C.”
“Yeah, well, so do you.”
“I know, right?”

Seth’s life tragically ended on a summer night when a car slammed into his bike and then drove off. It shouldn't take a tragedy to remind you that life can end abruptly, but somehow it always does, and it makes us very angry. “What a meaningless death” we say.

What a meaningful life, say I. Seth was only 36, but look how much he managed to accomplish, how much loyalty and respect he commanded, how much merit his opinion had among his peers. For his having been here, this world is richer, and for his passing it is now poorer.

We can all add meaning to our lives if we stop treating life as some kind of mundane and exasperating filler between weekends, holidays, and those fleeting breaks every now and again when we get a minute to do things we enjoy. We call it “the human condition,” and we avoid looking at each-other when we say that. But I truly believe that if we are just a bit more genuine, and a bit more passionate, and a bit more caring, then perhaps we will no longer have to use apologetic cliches when talking about our own lives.

We owe it to ourselves, and we owe it to our friend.

I miss you terribly, Seth. Rest in Peace.

Monday, March 04, 2013

Linux Foundation is looking for an awesome sysadmin

There is another fairly large project joining the Linux Foundation in the near future and we are looking for a senior systems administrator to join our Linux Foundation IT team. Normally, I'd advertise such position as "work from anywhere in US, Canada or Australia," but we're actually looking to fill this position in Portland, OR, as we need a few more hands physically located near our main datacentre.

So, if you're not already in Portland, OR, you have to be willing to move to Portland, OR. So, I guess, I must also add "must like good food, mild weather and weird people" to the list of job requirements. :)

We are looking for the following skills:
  • Excellent knowledge of RHEL-6
  • Familiarity with NetApp appliances
  • Good knowledge of networking
    • Vlans, iptables, ipv4, ipv6, etc
    • HP ProCurve switches
    • Juniper routers (a plus)
  • Yum, RPM
  • Puppet (+ Func)
  • SELinux
  • Load balancing using nginx, haproxy, LVS or similar
  • OpenVPN
  • Git (and gitolite)
  • KVM+Qemu+libvirt/virsh
  • Apache, MySQL
  • Familiarity with some java administration (i.e. "must know where to put WARs")
  • Good measure of excellence and awesomeness
Among perks:
  • Work from home (as long as this home is within driving distance of Portland)
  • Receive excellent benefits
    • Health
    • Dental
    • 401(K)
    • etc
  • Attend LF conferences in fun places (LinuxCon, Collab summit, etc)
  • Do cool things with cool people
  • Feel awesome about your work
If that sounds good, please send your resume to me: konstantin at linuxfoundation dot org.

Monday, February 18, 2013

My guest blog post on

My guest blog post on the 2-factor authentication work we did with Fedora Infrastructure is up on I tried to keep it aimed at the general audience, which is usually the kind frequenting Not sure if it counts as a "guest" post if I actually work for the Linux Foundation, but I call it "guest blog post" because I don't usually post anything there. :)

I got some flak for my recommendation to always use "/usr/bin/ssh" and "/usr/bin/sudo -i" instead of just "ssh" and "sudo". The argument is that if someone is able to modify your $PATH, they can probably modify your ~/.bash_profile or ~/.bashrc to load a trojaned version of bash. Yes, very true. However, typing in an extra "/usr/bin/" in front of those two commands will only inconvenience you a little bit, while an attacker would have to be sophisticated enough to replace your whole shell. I stand by my recommendation, at least for the case of "/usr/bin/sudo -i".

The recommendation list also originally had "don't run as user unconfined -- use staff_u or user_u, but I had to remove it, because it would have required at least a few more paragraphs in an already long piece. So, if I were to list my top 5 recommendations for significantly improving your Linux workstation security, in decreasing order, they would be (assuming you use Fedora or RHEL):

  1. Install NoScript for Firefox or ScriptSafe for Chrome/Chromium. It's an inconvenience worth suffering, considering very recent large company compromises that were done via browsers.
  2. Keep your workstation patched. If you don't like frequent changes, you can apply only security-sensitive patches using "yum --security update-minimal" (requires yum-plugin-security).
  3. Always leave SELinux in enforcing. Unlearn "setenforce 0" and use "semanage permissive -a [domain_t]" to only put specific SELinux domains into permissive mode.
  4. Run as staff_u (if you need to sudo) or user_u (if you don't). You can switch using "usermod -Z staff_u [username]". The change requires logout/login to take effect.
  5. Use long, easy to remember and type passphrases instead of short, hard to remember and to type passwords. Don't reuse important passwords anywhere. Change them every now and again.
I don't list physical security measures, since those usually are out of your hands, but it basically goes "don't let attackers get a hold of your systems, because then all bets are off." :) Technologies such as secure boot and disk encryption go a long way towards easing a lot of concerns, but they, too are merely deterrents.

Thursday, December 27, 2012

Limitations of Google Authenticator pam module

I have blogged about totpcgi in the past, and one of the common questions I hear back is "why not just use Google's own PAM module?" There are three main reasons:

1. Google's PAM module does not scale securely

Google Authenticator is just an implementation of OATH's TOTP, which stands for "Time-based One-Time Password." It really just boils down to a way to convert a timestamp into a 6-digit number using a preshared secret known only to your phone and to the authentication server. The important bit here are the words "One-Time" -- an attacker should not be able to reuse the password after it's been used once. But this is where the PAM module falls short.

Google's PAM module works by installing a file containing the secret, scratch tokens, plus a few other parameters (more on that later) into either the user's home directory or some other location on the system you are trying to secure via 2-factor authentication. Default configuration expects to find that file in $HOME/.google_authenticator, but it can also be installed centrally in /etc or /var and made work via a few PAM tricks.

However, if you have more than one server you are trying to secure (and most admins do), you cannot install the same secret file on multiple servers, as this would break the "One-Time" part of "TOTP." Since the servers don't communicate PAM state data with each-other, an attacker can take the token you used to authenticate on serverA, and reuse that token to authenticate on serverB. Since serverB has no way of knowing that the token has already been used on serverA, the authentication succeeds, thus largely defeating the purpose of 2-factor authentication.

The only way to remain secure is to use different secrets on serverA and serverB, but clearly this does not scale past 2-3 nodes and 2-3 users. If you have 10 users and 10 servers, you have to provision and keep track of a hundred secrets.

The only other way to remain secure when using the Google Authenticator PAM module is to store the secret and state data in some central location, such as on a shared filesystem. But here we run into another problem.

2. Google Authenticator stores the secret and state data in the same file

I'm rather surprised at this architectural decision -- usually you want to separate your configuration data from your state data and keep them in separate locations -- if only because this will minimize the risk that a failed attempt at saving your state will overwrite your Google Authenticator secret and thus lock you out.

This is an architecture choice that comes from the era before SELinux or other mandatory access control systems. On a system with "classic" unix permissions based on users and groups, a root process reading /etc/passwd would be able to also write to /etc/passwd, but this is no longer true on a SELinux-enabled system. These days we can tightly limit which files the processes are able to write to, and which only read -- even if the process is running as root.

So, if we were trying to distribute token files to multiple systems in order to be able to (securely) use the same preshared secrets on multiple servers, the fact that both the secret and state data are in the same file would rule out NFS and many similar networked filesystems -- we don't want the secrets travelling in cleartext over the wires. For the central storage to be secure, you'd have to find a stable and reliable networked filesystem that supports encryption in transit, and that suddenly makes the problem really complicated as such filesystems are few in number. It also doesn't solve the last problem.

3. Your preshared secrets are only as good as your weakest system

Even if you found a way to securely distribute Google Authenticator state data across multiple servers, you'd still have the problem that a root-level security incident on any one of your servers would require that all Google Authenticator tokens are immediately reissued. Not much of a concern if you only have a few users, but if you have dozens or hundreds of them, this becomes a nightmare scenario. You can't have your entire infrastructure grind to a halt while you feverishly reissue tokens to users only because someone did something dumb on "wwwtest02" and an attacker managed to gain root-level access and could read all preshared secrets.

So, all of the above was what prompted us to come up with totpcgi, which securely centralizes Google Authenticator-based 2-factor authentication infrastructure, plus adds a few features we liked, such as a way to accept "password+token" in one field, plus optionally encrypt the preshared secret with the user's password -- so that even if something goes wrong and the attacker is able to read all preshared secrets, the information will not be of any use without also knowing the users' passwords.

Monday, December 10, 2012

Using rsyslog with Netapp's snaplock

Netapp's "snaplock" technology allows one to create "write-once, read many" ("WORM") volumes that allow data to be written but not modified or deleted -- especially not if "compliance" mode is used. While this is not a true "WORM" -- it's still done entirely in software and therefore can theoretically be hacked -- it adds an extra layer of security to your infrastructure, especially if you already rely on netapps for your NAS needs.

Any file stored on a snaplock volume can be given WORM protection by first doing a "touch" and setting the atime with the date in the future, and then setting a read-only mode on the file. If after setting "read-only" on a file you give it a read-write permission, the file will be put into "append" mode -- data can be appended to the file, but no previously stored data can be modified or deleted (data is "locked" in 256K chunks).

This is extremely handy for storing system logs or copies of emails for archival purposes. We have a syslog aggregator that receives all our system logs, including auditd. I wrote a simple script that runs out of /etc/cron.daily that pre-creates the "append-only" locations for rsyslog to write to, plus sets read-only on yesterday's logs. Here's the code:

On rsyslog's side of things you'll need something like the following in order to write to these locations:

And, finally, you'll need to do some SELinux manipulations in order to allow rsyslog to write to the NFS location, such as setting the mount context to var_log_t. If you do that, then you'll need the following SELinux policy in order to allow the cron script from above to run:

Tuesday, July 31, 2012

Duplicity as stateful rsync


Let's say you have a very large master mirror serving hundreds of gigabytes of data -- most of it in small files that don't change very often. Let's also say you have a 100 hosts that want an exact copy of that data -- full or partial -- and they want it mirrored as quickly as possible so that any changes on the master are quickly available on the hosts.
What do you do?


Evidently, you would use rsync. However, you quickly realize that it is extremely inefficient due to its stateless nature and is therefore really poor at scaling up.
The way rsync works is by comparing various file attributes between the master mirror and the host. If you have a million files that you must mirror, on each run it will check a million files. If you have a hundred hosts that want to replicate your master data every fifteen minutes, you will be examining the same data 400,000,000 times each hour, reducing your master mirror to tears (or at least the administrator of that mirror).

Rsync --write-batch

You try to use rsync --write-batch on the master mirror and copy that file to your hosts instead. However, this quickly becomes a mess. If one of the mirroring hosts is down for maintenance and misses the propagation window, it will need to have multiple batches applied in order to catch up, or just use plain rsync to the mirror. If you have any number of hosts up or down, then you have to write thousands of lines of code just to keep track of which batch files can be applied and which must revert to rsync.
Moreover, if one of the hosts only wants a subset of your data, they will end up downloading a lot of batch data that they don't care about.

So, what do you do?

Basically, you are stuck. There is not a single open-source solution (that the author is aware of) that will allow one to efficiently propagate file changes.
The solution I am proposing doesn't yet exist, but I believe that it's a feature that can be easily added to an existing tool that is already extremely good at keeping track of changes and creating bandwidth-efficient deltas.

What is duplicity

From the project's web page:
Duplicity backs directories by producing encrypted tar-format volumes and uploading them to a remote or local file server. Because duplicity uses librsync, the incremental archives are space efficient and only record the parts of files that have changed since the last backup. Because duplicity uses GnuPG to encrypt and/or sign these archives, they will be safe from spying and/or modification by the server. The duplicity package also includes the rdiffdir utility. Rdiffdir is an extension of librsync's rdiff to directories -- it can be used to produce signatures and deltas of directories as well as regular files. These signatures and deltas are in GNU tar format.
Let's ignore the encryption bits for the moment -- that is an optional switch and duplicity can create unencrypted volumes. After duplicity establishes a baseline of your backup, it will then create space-efficient deltas of any changes and store them in clearly designated, timestamped files -- "difftars". It will also create a full file manifest with checksums of all files to make it easy to figure out during next run which files were changed or added.

How does that help?

Think of it in the following terms -- instead of publishing full trees of your files on the master, you publish a duplicity backup repository. In order to create the initial mirror, your hosts will effectively do a "duplicity restore", which will download all duplicity volumes from the master and do a full restore.
Here's where we get to the functionality that doesn't yet exist. Once the initial data mirror is complete, your hosts will periodically connect to the master to see if new duplicity manifest files are available. If there are new manifest files, that means there is new data to download. The hosts will then use manifest files to download the difftars with change deltas, and effectively "replay" them on their local backup mirrors.
Periodically, hosts can also go through the file signatures in manifest files to make sure that the local tree is still in sync with the master mirror. Since all manifests carry the mapping of files to difftars, if for some reason there are local discrepancies, hosts can download the necessary volumes from the master mirror in order to correct any errors.

Nice! What else is awesome about this?

  • Encryption is built-in into duplicity, so if you wanted to, you could encrypt all your data.
  • Duplicity supports a slew of backends, including ftp and rsync, which are already provided by large mirrors.
  • Duplicity creates difftars in volumes (25MB in size by default, but easily configurable). If one of the hosts needs to carry only a subtree of the master mirror, they can use the manifest mapping to only download the difftars containing the changes they want.
  • If one of the hosts need to update the mirror once a day instead of every 15 minutes, they can just replay multiple accumulated difftars.
  • If one of the hosts was down for a week, they can replay a week's worth of difftars to catch up.
  • If the master does a full rebase, the hosts can easily recognize this fact and use the manifest and signature files to figure out which volumes out of the new full backup they need to download.

So, what's the catch?

There are two that I'm aware of:
  1. Duplicity does not work like that yet. I believe it would be fairly easy to either extend duplicity proper to support this functionality, or write a separate tool that would build on top and pretty much "import duplicity" (yes, it is written in python).
  2. You effectively need twice the room on the master mirror -- one for the actual trees and another for duplicity content. This is a small price to pay if the benefit is dramatically reduced disk thrashing.

What can I do to help?

started a discussion on this topic on the duplicity-talk mailing list. Please feel free to join!
If you read that thread, you will see that I'm willing to obtain funding from the Linux Foundation to help develop this feature either as part of the duplicity proper or as a separate add-on tool.
If you write this feature, all mirror admins in the world will instantly love you.

Thursday, May 24, 2012

Senior Systems Administrator at the Linux Foundation

Hello, Fedorans!

There is a fairly large project joining the Linux Foundation in the near future and we are looking for a US- or Canada-based senior systems administrator to join our Linux Foundation IT team. We are looking for the following skills:
  • Excellent knowledge of RHEL-6
  • Familiarity with NetApp appliances
  • Good knowledge of networking
    • Vlans, iptables, ipv4, ipv6, etc
    • HP ProCurve switches
    • Juniper routers a plus
  • Yum, RPM
  • Puppet + Func
  • SELinux
  • Load balancing using nginx, haproxy, LVS
  • OpenVPN
  • Git (and gitolite)
  • KVM+Qemu+libvirt/virsh
  • Postfix+Dovecot
  • Apache, PostgreSQL
  • Work from home
  • Receive excellent benefits
  • Attend LF conferences in fun places
  • Do cool things with cool people
  • Feel awesome about your work
If that sounds good, please send your resume to

Monday, May 14, 2012

My simple IT support rules

  1. Don't be an asshole.
  2. If someone is being an asshole to you, see #1.
  3. Are they upset at something? Identify and fix. Repeat as required.
  4. Assholes are infectious and easy to flare. Avoid them, unless #3.

Sunday, April 29, 2012

Higher education as investment opportunity

It strikes me that most people in favour of tuition hikes view higher education as a net loss paid by their taxes, rather than as an investment that will bring high dividends in the future. It is my wish that more people approached higher education funding like venture capitalists approach startups -- as an investment rather than as a cost. Let me explain what I mean.

Statistically, 9 8 out of 10 startups will fail, costing venture capitalists millions. However, the 2 that succeed will more than cover the losses on the other 8, with lots of extra profit on top, which is why the VCs continue to do it.

Opponents to free higher education tend to point out how many students have trouble finding jobs after they graduate, especially those who chose to major in humanities. However, if we look at it from the same perspective as venture capitalists, it doesn't matter that many students who receive higher education end up working minimal-wage jobs. We as a society reap our monetary and cultural benefits from those few who do succeed.

Averages can be tricky, but we shouldn't ignore them. On average, people with higher education tend to earn more money. People who earn more money pay more in taxes. A province with more highly educated people will be on average more prosperous than a province with fewer highly educated people. This goes beyond just monetary prosperity -- a lot has to be said about culture and just plain joie-de-vivre. Money doesn't tend to stay for very long in dull, desolate places.

But this isn't to say that we shouldn't be smart about investing our money. It doesn't make sense to invest tens of thousands into someone's doctorate degree just to see them move to some other province or country, leaving us to foot the bill. Similarly, it doesn't make sense for all universities to charge the same tuition fees. Some are world-renowned institutions, while others are humble community colleges. One size doesn't fit all.

By changing how we allocate funds, we can both allow universities to charge tuition based on their needs, and hedge our bets to ensure that the money we spend on higher education brings dividends back into our pockets, instead of the pockets of some other provinces or countries.

So, how do we do it?

The government will provide students with a no-interest, inflation-indexed loan to complete their studies. Upon graduation, the government will establish a simple repayment scheme:
  1. If the person resides in the province but has no income, the government simply re-indexes the loan to keep up with inflation.
  2. If the person resides in the province and has income, the government repays a part of their loan. The sum repaid reflects the amount paid in provincial taxes.
  3. If the person leaves the province, the loan is sold to a commercial bank of their choice. The taxpayers are fully reimbursed and repaying the loan becomes that individual's responsibility.
Simple enough?

Let's say Pierre goes to McGill to become a family doctor:
  • McGill charges him $10,000 a year for 5 years of studies, resulting in a $50,000 loan with the government. 
  • After graduating, Pierre has trouble finding work for a couple of years, so his loan remains with the government and grows by 1-2% a year, reflecting inflation. 
  • In his third year, Pierre finds a job and earns $60,000 in income, paying $10,000 in provincial taxes. That year, the government pays a $2,000 instalment towards the loan. 
  • Five years later, Pierre earns $150,000 a year, paying $30,000 in provincial taxes and the government pays a $6,000 instalment towards the loan. 
  • A year later, Pierre finds a job in the US and moves to work there. The remainder of his loan is sold to a commercial bank, which pays off the loan to the taxpayers.

As you see, this is a net win to the government (and us, as taxpayers), since not only did we not lose any money on Pierre's education, but earned many times as much in taxes, not to mention benefited greatly from his skills as a practising doctor.

Let's take another example:
  • Pauline goes to UQAM to become a teacher. UQAM charges her $8,000 a year for 3 years. Her loan with the government is $24,000. 
  • Upon graduation, she finds a job right away, but only earns $35,000 a year for the first few years, paying $4,000 in provincial taxes. The government pays the minimal instalment of $1,000 towards her loan. 
  • 10 years later, Pauline earns $60,000 a year and pays $10,000 in taxes. Her loan is reimbursed at $2,000 annually. 
  • Pauline never leaves Quebec and her loan is fully paid off in 17 years, except she'd completely forgotten that she had a loan at all, as she's never had to worry about payments. 
  • As far as Pauline is concerned, she never paid a dime for going to school (which, of course, she did, via her taxes).

And, let's take a third example:
  • Michael goes to Duke university in North Carolina and finishes his physics degree in 4 years, taking out a loan of $150,000 to pay for his studies. 
  • He finds out about a "Come work for us, we'll help pay your student loans" program in Quebec and finds a job with Hydro for $90,000 a year, paying $20,000 in provincial taxes. Quebec government gives him a $4,000 annual tax credit towards his student loan payments.

Lastly, to prevent anyone from "perpetually staying in school," we can choose an arbitrary ceiling, such as $80,000, after which no more loans will be issued by the government.

This scheme is simple, transparent, and assures that any amount we spend on higher education is a true investment into the province that stays in the province, all without weighing down our youth with heavy loans at precisely the age when what they need most is a boost.

Wednesday, April 18, 2012

Puppet eyes

Have you ever worked in an environment where some files are in puppet, but not all of them? E.g. when you aren't starting from scratch but have to "puppetize" an existing infrastructure?

Have you ever had to wonder "wait, is this file in puppet?"

Put puppet-eyes.vim in /root/.vim/plugin (or in /usr/share/vim/vimfiles/plugin if you want to enable this globally) and VIM will display an alert when you are trying to edit a file managed by puppet.

Note: To work, the file definition in puppet must have "checksum = md5".

Project page on github