Monday, March 04, 2013

Linux Foundation is looking for an awesome sysadmin

There is another fairly large project joining the Linux Foundation in the near future and we are looking for a senior systems administrator to join our Linux Foundation IT team. Normally, I'd advertise such position as "work from anywhere in US, Canada or Australia," but we're actually looking to fill this position in Portland, OR, as we need a few more hands physically located near our main datacentre.

So, if you're not already in Portland, OR, you have to be willing to move to Portland, OR. So, I guess, I must also add "must like good food, mild weather and weird people" to the list of job requirements. :)

We are looking for the following skills:
  • Excellent knowledge of RHEL-6
  • Familiarity with NetApp appliances
  • Good knowledge of networking
    • Vlans, iptables, ipv4, ipv6, etc
    • HP ProCurve switches
    • Juniper routers (a plus)
  • Yum, RPM
  • Puppet (+ Func)
  • SELinux
  • Load balancing using nginx, haproxy, LVS or similar
  • OpenVPN
  • Git (and gitolite)
  • KVM+Qemu+libvirt/virsh
  • Apache, MySQL
  • Familiarity with some java administration (i.e. "must know where to put WARs")
  • Good measure of excellence and awesomeness
Among perks:
  • Work from home (as long as this home is within driving distance of Portland)
  • Receive excellent benefits
    • Health
    • Dental
    • 401(K)
    • etc
  • Attend LF conferences in fun places (LinuxCon, Collab summit, etc)
  • Do cool things with cool people
  • Feel awesome about your work
If that sounds good, please send your resume to me: konstantin at linuxfoundation dot org.

Monday, February 18, 2013

My guest blog post on linux.com

My guest blog post on the 2-factor authentication work we did with Fedora Infrastructure is up on Linux.com. I tried to keep it aimed at the general audience, which is usually the kind frequenting linux.com. Not sure if it counts as a "guest" post if I actually work for the Linux Foundation, but I call it "guest blog post" because I don't usually post anything there. :)

I got some flak for my recommendation to always use "/usr/bin/ssh" and "/usr/bin/sudo -i" instead of just "ssh" and "sudo". The argument is that if someone is able to modify your $PATH, they can probably modify your ~/.bash_profile or ~/.bashrc to load a trojaned version of bash. Yes, very true. However, typing in an extra "/usr/bin/" in front of those two commands will only inconvenience you a little bit, while an attacker would have to be sophisticated enough to replace your whole shell. I stand by my recommendation, at least for the case of "/usr/bin/sudo -i".

The recommendation list also originally had "don't run as user unconfined -- use staff_u or user_u, but I had to remove it, because it would have required at least a few more paragraphs in an already long piece. So, if I were to list my top 5 recommendations for significantly improving your Linux workstation security, in decreasing order, they would be (assuming you use Fedora or RHEL):

  1. Install NoScript for Firefox or ScriptSafe for Chrome/Chromium. It's an inconvenience worth suffering, considering very recent large company compromises that were done via browsers.
  2. Keep your workstation patched. If you don't like frequent changes, you can apply only security-sensitive patches using "yum --security update-minimal" (requires yum-plugin-security).
  3. Always leave SELinux in enforcing. Unlearn "setenforce 0" and use "semanage permissive -a [domain_t]" to only put specific SELinux domains into permissive mode.
  4. Run as staff_u (if you need to sudo) or user_u (if you don't). You can switch using "usermod -Z staff_u [username]". The change requires logout/login to take effect.
  5. Use long, easy to remember and type passphrases instead of short, hard to remember and to type passwords. Don't reuse important passwords anywhere. Change them every now and again.
I don't list physical security measures, since those usually are out of your hands, but it basically goes "don't let attackers get a hold of your systems, because then all bets are off." :) Technologies such as secure boot and disk encryption go a long way towards easing a lot of concerns, but they, too are merely deterrents.

Thursday, December 27, 2012

Limitations of Google Authenticator pam module

I have blogged about totpcgi in the past, and one of the common questions I hear back is "why not just use Google's own PAM module?" There are three main reasons:

1. Google's PAM module does not scale securely


Google Authenticator is just an implementation of OATH's TOTP, which stands for "Time-based One-Time Password." It really just boils down to a way to convert a timestamp into a 6-digit number using a preshared secret known only to your phone and to the authentication server. The important bit here are the words "One-Time" -- an attacker should not be able to reuse the password after it's been used once. But this is where the PAM module falls short.

Google's PAM module works by installing a file containing the secret, scratch tokens, plus a few other parameters (more on that later) into either the user's home directory or some other location on the system you are trying to secure via 2-factor authentication. Default configuration expects to find that file in $HOME/.google_authenticator, but it can also be installed centrally in /etc or /var and made work via a few PAM tricks.

However, if you have more than one server you are trying to secure (and most admins do), you cannot install the same secret file on multiple servers, as this would break the "One-Time" part of "TOTP." Since the servers don't communicate PAM state data with each-other, an attacker can take the token you used to authenticate on serverA, and reuse that token to authenticate on serverB. Since serverB has no way of knowing that the token has already been used on serverA, the authentication succeeds, thus largely defeating the purpose of 2-factor authentication.

The only way to remain secure is to use different secrets on serverA and serverB, but clearly this does not scale past 2-3 nodes and 2-3 users. If you have 10 users and 10 servers, you have to provision and keep track of a hundred secrets.

The only other way to remain secure when using the Google Authenticator PAM module is to store the secret and state data in some central location, such as on a shared filesystem. But here we run into another problem.

2. Google Authenticator stores the secret and state data in the same file


I'm rather surprised at this architectural decision -- usually you want to separate your configuration data from your state data and keep them in separate locations -- if only because this will minimize the risk that a failed attempt at saving your state will overwrite your Google Authenticator secret and thus lock you out.

This is an architecture choice that comes from the era before SELinux or other mandatory access control systems. On a system with "classic" unix permissions based on users and groups, a root process reading /etc/passwd would be able to also write to /etc/passwd, but this is no longer true on a SELinux-enabled system. These days we can tightly limit which files the processes are able to write to, and which only read -- even if the process is running as root.

So, if we were trying to distribute token files to multiple systems in order to be able to (securely) use the same preshared secrets on multiple servers, the fact that both the secret and state data are in the same file would rule out NFS and many similar networked filesystems -- we don't want the secrets travelling in cleartext over the wires. For the central storage to be secure, you'd have to find a stable and reliable networked filesystem that supports encryption in transit, and that suddenly makes the problem really complicated as such filesystems are few in number. It also doesn't solve the last problem.

3. Your preshared secrets are only as good as your weakest system


Even if you found a way to securely distribute Google Authenticator state data across multiple servers, you'd still have the problem that a root-level security incident on any one of your servers would require that all Google Authenticator tokens are immediately reissued. Not much of a concern if you only have a few users, but if you have dozens or hundreds of them, this becomes a nightmare scenario. You can't have your entire infrastructure grind to a halt while you feverishly reissue tokens to users only because someone did something dumb on "wwwtest02" and an attacker managed to gain root-level access and could read all preshared secrets.

So, all of the above was what prompted us to come up with totpcgi, which securely centralizes Google Authenticator-based 2-factor authentication infrastructure, plus adds a few features we liked, such as a way to accept "password+token" in one field, plus optionally encrypt the preshared secret with the user's password -- so that even if something goes wrong and the attacker is able to read all preshared secrets, the information will not be of any use without also knowing the users' passwords.

Monday, December 10, 2012

Using rsyslog with Netapp's snaplock

Netapp's "snaplock" technology allows one to create "write-once, read many" ("WORM") volumes that allow data to be written but not modified or deleted -- especially not if "compliance" mode is used. While this is not a true "WORM" -- it's still done entirely in software and therefore can theoretically be hacked -- it adds an extra layer of security to your infrastructure, especially if you already rely on netapps for your NAS needs.

Any file stored on a snaplock volume can be given WORM protection by first doing a "touch" and setting the atime with the date in the future, and then setting a read-only mode on the file. If after setting "read-only" on a file you give it a read-write permission, the file will be put into "append" mode -- data can be appended to the file, but no previously stored data can be modified or deleted (data is "locked" in 256K chunks).

This is extremely handy for storing system logs or copies of emails for archival purposes. We have a syslog aggregator that receives all our system logs, including auditd. I wrote a simple script that runs out of /etc/cron.daily that pre-creates the "append-only" locations for rsyslog to write to, plus sets read-only on yesterday's logs. Here's the code:

On rsyslog's side of things you'll need something like the following in order to write to these locations:

And, finally, you'll need to do some SELinux manipulations in order to allow rsyslog to write to the NFS location, such as setting the mount context to var_log_t. If you do that, then you'll need the following SELinux policy in order to allow the cron script from above to run:

Tuesday, July 31, 2012

Duplicity as stateful rsync

Problem

Let's say you have a very large master mirror serving hundreds of gigabytes of data -- most of it in small files that don't change very often. Let's also say you have a 100 hosts that want an exact copy of that data -- full or partial -- and they want it mirrored as quickly as possible so that any changes on the master are quickly available on the hosts.
What do you do?

Rsync

Evidently, you would use rsync. However, you quickly realize that it is extremely inefficient due to its stateless nature and is therefore really poor at scaling up.
The way rsync works is by comparing various file attributes between the master mirror and the host. If you have a million files that you must mirror, on each run it will check a million files. If you have a hundred hosts that want to replicate your master data every fifteen minutes, you will be examining the same data 400,000,000 times each hour, reducing your master mirror to tears (or at least the administrator of that mirror).

Rsync --write-batch

You try to use rsync --write-batch on the master mirror and copy that file to your hosts instead. However, this quickly becomes a mess. If one of the mirroring hosts is down for maintenance and misses the propagation window, it will need to have multiple batches applied in order to catch up, or just use plain rsync to the mirror. If you have any number of hosts up or down, then you have to write thousands of lines of code just to keep track of which batch files can be applied and which must revert to rsync.
Moreover, if one of the hosts only wants a subset of your data, they will end up downloading a lot of batch data that they don't care about.

So, what do you do?

Basically, you are stuck. There is not a single open-source solution (that the author is aware of) that will allow one to efficiently propagate file changes.
The solution I am proposing doesn't yet exist, but I believe that it's a feature that can be easily added to an existing tool that is already extremely good at keeping track of changes and creating bandwidth-efficient deltas.

What is duplicity

From the project's web page:
Duplicity backs directories by producing encrypted tar-format volumes and uploading them to a remote or local file server. Because duplicity uses librsync, the incremental archives are space efficient and only record the parts of files that have changed since the last backup. Because duplicity uses GnuPG to encrypt and/or sign these archives, they will be safe from spying and/or modification by the server. The duplicity package also includes the rdiffdir utility. Rdiffdir is an extension of librsync's rdiff to directories -- it can be used to produce signatures and deltas of directories as well as regular files. These signatures and deltas are in GNU tar format.
Let's ignore the encryption bits for the moment -- that is an optional switch and duplicity can create unencrypted volumes. After duplicity establishes a baseline of your backup, it will then create space-efficient deltas of any changes and store them in clearly designated, timestamped files -- "difftars". It will also create a full file manifest with checksums of all files to make it easy to figure out during next run which files were changed or added.

How does that help?

Think of it in the following terms -- instead of publishing full trees of your files on the master, you publish a duplicity backup repository. In order to create the initial mirror, your hosts will effectively do a "duplicity restore", which will download all duplicity volumes from the master and do a full restore.
Here's where we get to the functionality that doesn't yet exist. Once the initial data mirror is complete, your hosts will periodically connect to the master to see if new duplicity manifest files are available. If there are new manifest files, that means there is new data to download. The hosts will then use manifest files to download the difftars with change deltas, and effectively "replay" them on their local backup mirrors.
Periodically, hosts can also go through the file signatures in manifest files to make sure that the local tree is still in sync with the master mirror. Since all manifests carry the mapping of files to difftars, if for some reason there are local discrepancies, hosts can download the necessary volumes from the master mirror in order to correct any errors.

Nice! What else is awesome about this?

  • Encryption is built-in into duplicity, so if you wanted to, you could encrypt all your data.
  • Duplicity supports a slew of backends, including ftp and rsync, which are already provided by large mirrors.
  • Duplicity creates difftars in volumes (25MB in size by default, but easily configurable). If one of the hosts needs to carry only a subtree of the master mirror, they can use the manifest mapping to only download the difftars containing the changes they want.
  • If one of the hosts need to update the mirror once a day instead of every 15 minutes, they can just replay multiple accumulated difftars.
  • If one of the hosts was down for a week, they can replay a week's worth of difftars to catch up.
  • If the master does a full rebase, the hosts can easily recognize this fact and use the manifest and signature files to figure out which volumes out of the new full backup they need to download.

So, what's the catch?

There are two that I'm aware of:
  1. Duplicity does not work like that yet. I believe it would be fairly easy to either extend duplicity proper to support this functionality, or write a separate tool that would build on top and pretty much "import duplicity" (yes, it is written in python).
  2. You effectively need twice the room on the master mirror -- one for the actual trees and another for duplicity content. This is a small price to pay if the benefit is dramatically reduced disk thrashing.

What can I do to help?

started a discussion on this topic on the duplicity-talk mailing list. Please feel free to join!
If you read that thread, you will see that I'm willing to obtain funding from the Linux Foundation to help develop this feature either as part of the duplicity proper or as a separate add-on tool.
If you write this feature, all mirror admins in the world will instantly love you.

Thursday, May 24, 2012

Senior Systems Administrator at the Linux Foundation

Hello, Fedorans!

There is a fairly large project joining the Linux Foundation in the near future and we are looking for a US- or Canada-based senior systems administrator to join our Linux Foundation IT team. We are looking for the following skills:
  • Excellent knowledge of RHEL-6
  • Familiarity with NetApp appliances
  • Good knowledge of networking
    • Vlans, iptables, ipv4, ipv6, etc
    • HP ProCurve switches
    • Juniper routers a plus
  • Yum, RPM
  • Puppet + Func
  • SELinux
  • Load balancing using nginx, haproxy, LVS
  • OpenVPN
  • Git (and gitolite)
  • KVM+Qemu+libvirt/virsh
  • Postfix+Dovecot
  • Apache, PostgreSQL
Perks:
  • Work from home
  • Receive excellent benefits
  • Attend LF conferences in fun places
  • Do cool things with cool people
  • Feel awesome about your work
If that sounds good, please send your resume to tbd-job@linuxfoundation.org.

Monday, May 14, 2012

My simple IT support rules

  1. Don't be an asshole.
  2. If someone is being an asshole to you, see #1.
  3. Are they upset at something? Identify and fix. Repeat as required.
  4. Assholes are infectious and easy to flare. Avoid them, unless #3.

Sunday, April 29, 2012

Higher education as investment opportunity


It strikes me that most people in favour of tuition hikes view higher education as a net loss paid by their taxes, rather than as an investment that will bring high dividends in the future. It is my wish that more people approached higher education funding like venture capitalists approach startups -- as an investment rather than as a cost. Let me explain what I mean.

Statistically, 9 8 out of 10 startups will fail, costing venture capitalists millions. However, the 2 that succeed will more than cover the losses on the other 8, with lots of extra profit on top, which is why the VCs continue to do it.

Opponents to free higher education tend to point out how many students have trouble finding jobs after they graduate, especially those who chose to major in humanities. However, if we look at it from the same perspective as venture capitalists, it doesn't matter that many students who receive higher education end up working minimal-wage jobs. We as a society reap our monetary and cultural benefits from those few who do succeed.

Averages can be tricky, but we shouldn't ignore them. On average, people with higher education tend to earn more money. People who earn more money pay more in taxes. A province with more highly educated people will be on average more prosperous than a province with fewer highly educated people. This goes beyond just monetary prosperity -- a lot has to be said about culture and just plain joie-de-vivre. Money doesn't tend to stay for very long in dull, desolate places.

But this isn't to say that we shouldn't be smart about investing our money. It doesn't make sense to invest tens of thousands into someone's doctorate degree just to see them move to some other province or country, leaving us to foot the bill. Similarly, it doesn't make sense for all universities to charge the same tuition fees. Some are world-renowned institutions, while others are humble community colleges. One size doesn't fit all.

By changing how we allocate funds, we can both allow universities to charge tuition based on their needs, and hedge our bets to ensure that the money we spend on higher education brings dividends back into our pockets, instead of the pockets of some other provinces or countries.

So, how do we do it?

The government will provide students with a no-interest, inflation-indexed loan to complete their studies. Upon graduation, the government will establish a simple repayment scheme:
  1. If the person resides in the province but has no income, the government simply re-indexes the loan to keep up with inflation.
  2. If the person resides in the province and has income, the government repays a part of their loan. The sum repaid reflects the amount paid in provincial taxes.
  3. If the person leaves the province, the loan is sold to a commercial bank of their choice. The taxpayers are fully reimbursed and repaying the loan becomes that individual's responsibility.
Simple enough?

Let's say Pierre goes to McGill to become a family doctor:
  • McGill charges him $10,000 a year for 5 years of studies, resulting in a $50,000 loan with the government. 
  • After graduating, Pierre has trouble finding work for a couple of years, so his loan remains with the government and grows by 1-2% a year, reflecting inflation. 
  • In his third year, Pierre finds a job and earns $60,000 in income, paying $10,000 in provincial taxes. That year, the government pays a $2,000 instalment towards the loan. 
  • Five years later, Pierre earns $150,000 a year, paying $30,000 in provincial taxes and the government pays a $6,000 instalment towards the loan. 
  • A year later, Pierre finds a job in the US and moves to work there. The remainder of his loan is sold to a commercial bank, which pays off the loan to the taxpayers.

As you see, this is a net win to the government (and us, as taxpayers), since not only did we not lose any money on Pierre's education, but earned many times as much in taxes, not to mention benefited greatly from his skills as a practising doctor.

Let's take another example:
  • Pauline goes to UQAM to become a teacher. UQAM charges her $8,000 a year for 3 years. Her loan with the government is $24,000. 
  • Upon graduation, she finds a job right away, but only earns $35,000 a year for the first few years, paying $4,000 in provincial taxes. The government pays the minimal instalment of $1,000 towards her loan. 
  • 10 years later, Pauline earns $60,000 a year and pays $10,000 in taxes. Her loan is reimbursed at $2,000 annually. 
  • Pauline never leaves Quebec and her loan is fully paid off in 17 years, except she'd completely forgotten that she had a loan at all, as she's never had to worry about payments. 
  • As far as Pauline is concerned, she never paid a dime for going to school (which, of course, she did, via her taxes).

And, let's take a third example:
  • Michael goes to Duke university in North Carolina and finishes his physics degree in 4 years, taking out a loan of $150,000 to pay for his studies. 
  • He finds out about a "Come work for us, we'll help pay your student loans" program in Quebec and finds a job with Hydro for $90,000 a year, paying $20,000 in provincial taxes. Quebec government gives him a $4,000 annual tax credit towards his student loan payments.

Lastly, to prevent anyone from "perpetually staying in school," we can choose an arbitrary ceiling, such as $80,000, after which no more loans will be issued by the government.

This scheme is simple, transparent, and assures that any amount we spend on higher education is a true investment into the province that stays in the province, all without weighing down our youth with heavy loans at precisely the age when what they need most is a boost.

Wednesday, April 18, 2012

Puppet eyes


Have you ever worked in an environment where some files are in puppet, but not all of them? E.g. when you aren't starting from scratch but have to "puppetize" an existing infrastructure?

Have you ever had to wonder "wait, is this file in puppet?"

Put puppet-eyes.vim in /root/.vim/plugin (or in /usr/share/vim/vimfiles/plugin if you want to enable this globally) and VIM will display an alert when you are trying to edit a file managed by puppet.

Note: To work, the file definition in puppet must have "checksum = md5".

Project page on github

Wednesday, March 28, 2012

Restrict the gitolite user with SELinux

One of the things I have been working on is adding SELinux user profiles to all of our non-system users. Most recently, I wrote a custom SELinux role for the gitolite user, to further restrict what it is able to do. As I've had a hard time finding the right resources online, I figured I'll write it up here.

By default, all users are running as unconfined, meaning that things act pretty much as if SELinux was disabled. Included in the SELinux policy are 3 SELinux user profiles:
  • staff_u: can sudo
  • user_u: can do most things regular users can do
  • xguest_u: can run X and some applications, but not get on the network
  • guest_u: can't even run X, but can move files around
I'm not going into much detail -- see Fedora SELinux docs for more detailed distinctions between these roles. For the gitolite user, I wanted to put the same restrictions as guest_u, except allow it to transition to the gitosis_t domain (gitolite used to be known as "gitosis," so the policy name stuck).

Let's start by writing a new user policy for our gitolite user. I call it mygitoliteuser_u, and the policy will be in the file mygitoliteuser.te:
policy_module(mygitoliteuser, 1.0.0)

require {
    type system_mail_t;
    type postfix_postdrop_t;
}

role mygitoliteuser_r;

role mygitoliteuser_r types { system_mail_t postfix_postdrop_t };

userdom_restricted_user_template(mygitoliteuser)
gitosis_run(mygitoliteuser_t, mygitoliteuser_r)
gen_user(mygitoliteuser_u, user, mygitoliteuser_r, s0, s0)
A few things going on here:
  1. We base it off the userdom_restricted_user_template(), which is what guest_u uses.
  2. We allow it to run gitolite via the gitosis_run() interface.
  3. We additionally let it send email. Note, that theoretically this should be covered by the mta_role() interface, but it wasn't doing the right thing for me.
To compile and load the policy, run:
make -f /usr/share/selinux/devel/Makefile mygitoliteuser.pp
semodule -i mygitoliteuser.pp
Now set up the contexts for the new mygitoliteuser_u:
cd /etc/selinux/targeted/contexts/users
cat guest_u | sed 's/guest_/mygitoliteuser_/g' > mygitoliteuser_u
Now you need to assign this profile to the gitolite user:
usermod -Z mygitoliteuser_u gitolite
Now here is where things get annoying. Once you do this, don't try to run "restorecon /var/lib/gitolite", as this will screw up the labels on everything in that directory and label it as user_home_t. You see, all currently released versions of semanage assume that if a user has a real shell, its home directory needs to be labelled as user_home_t, which is sane reasoning, but doesn't work for things like gitolite user. There is a fix for this behaviour in libsemanage 2.1.5 -- you can set ignoredirs=/var/lib/gitolite in /etc/selinux/semanage.conf, but this is not helpful on RHEL6.

Anyway, the only real solution currently is to set up a cronjob that would make sure that everything in /var/lib/gitolite is labelled as gitosis_var_lib_t. I used puppet for this purpose:
file  { '/var/lib/gitolite':
  seltype => 'gitosis_var_lib_t',
  recurse => true,
}
That's about it. I may as well share my tweaks to the default gitosis policy here:
policy_module(mygitosis, 1.0.0)

require {
  type gitosis_t;
  type gitosis_exec_t;
  type tmp_t;
  type ssh_home_t;
  type bin_t;
  type fs_t;
}

# required by fork
allow gitosis_t gitosis_exec_t:file execute_no_trans;

# used by hooks (usually here-docs)
allow gitosis_t tmp_t:dir { write remove_name add_name };
allow gitosis_t tmp_t:file { write getattr read create unlink open };

# these appear bogus
dontaudit gitosis_t bin_t:file setattr;
dontaudit gitosis_t fs_t:filesystem getattr;

optional_policy(`
  mta_send_mail(gitosis_t)
')