190 lines
6.9 KiB
Markdown
190 lines
6.9 KiB
Markdown
---
|
|
layout: post
|
|
title: "Experiments in owning data"
|
|
date: 2019-01-20
|
|
comments: true
|
|
tags: freebsd
|
|
---
|
|
|
|
I have been working for a while to own most of the data I generate. Thought I
|
|
would write down what I mean by that and how I am doing so far.
|
|
|
|
Before this effort most of my data where spread across many proprietary
|
|
services, some free, some paid. I had always felt I had restricted control over
|
|
them, and I had to find out some free tier [restrictions the hard
|
|
way](https://www.quora.com/What-happens-to-your-older-photos-once-you-go-over-the-free-200-limit-on-Flickr-without-turning-pro).
|
|
|
|
So this started as an effort to
|
|
|
|
1. organize all the things that I wanted in a central (virtual) place.
|
|
2. and fine grain control over who has access to the data.
|
|
|
|
However all of the service that I looked into was made for exactly what I wanted
|
|
to avoid - a free service that monetizes based on my personal data, they do take
|
|
my money to provide "upgrades", and my data may not be mined (or maybe - no way
|
|
to ensure that). Also service behaviors were sometimes opaque and confusing,
|
|
even [causing people to loose
|
|
data](https://productforums.google.com/forum/#!topic/apps/RIHSJ4LIXwE).
|
|
|
|
Another thing that stood out was how inflexible these services were. Mostly
|
|
designed as big monoliths that does not play well with others. For e.g google
|
|
photos is really nice - but what if I want to run an imagemagic script over all
|
|
of the photos I have? I think there is someway to do this if you poke at the
|
|
photos API, however the friction is too much compared to just mounting them over
|
|
webdav or fuse. For a lot of these services Linux was a second class citizen,
|
|
and FreeBSD an undiscovered species. I understand these are not common
|
|
requirements, but I wanted the system to work with things I use and have.
|
|
|
|
## Hardware
|
|
At the moment, what I call my personal data is ~500GB, that's all the pictures,
|
|
emails documents, code and other things that I have. Assuming a 3 fold growth (probably too low?) I decided that I need around 2.5TB storage. Other requirements were,
|
|
|
|
1. connected to reasonably fast internet and reliable power
|
|
2. cheap (remember, migrating out of this system is going to be really painful)
|
|
|
|
After some consideration I decided to not to host my hardware, I move around a
|
|
lot and state of home internet in Germany is not where I'd like it to be.
|
|
|
|
Requirements for storage made most of the cloud providers unfeasible (_3TB EBS is
|
|
~$350/month_).
|
|
|
|
I finally settled on a physical machine from hetzner [server
|
|
auction](https://www.hetzner.com/sb). Server auction is where they sell their
|
|
older generation machines (read: sandy bride/ivy bridge) at a steep discount. I
|
|
was able to get a Xeon E3 with 32GB ECC ram and 2x3TB disks for 30 EUR a month.
|
|
|
|
It could have been a bit cheaper if had gone with an i7 machine (newer cpu too)
|
|
instead. But they don't ECC RAM. Intel is very adamant in not supporting ECC in
|
|
"desktop class" processors.
|
|
|
|
## Installation
|
|
Installation was piece of cake, hetzner allows you to boot the server into
|
|
`freebsd rescue mode` where they point server to PXE boot from a
|
|
[`mfsbsd`](https://mfsbsd.vx.sk/) disk and lets you ssh, and then you can start
|
|
installing `FreeBSD` (one can follow a similar procedure for Linux distros with
|
|
a linux rescue image..)
|
|
|
|
## Security
|
|
Even though the main goal is to avoid mass surveillance, I also wanted to avoid
|
|
data leaks because of unplanned events - me not paying bills, hardware failures
|
|
etc. The solution was to encrypt the disks, so that at rest nobody can sniff
|
|
data out of them.
|
|
|
|
This became a challenge because getting access to KVM in hetzner environment is
|
|
not instant. One need to send them a request and a human mails you kvm access
|
|
creds for an hour (they are usually fast though). This is a challenge because
|
|
every time I need to reboot the server I would need to get KVM access, type in
|
|
my password over KVM (also not sure how much of that encryption I can trust..)
|
|
and let the machine boot.
|
|
|
|
### Two Zpools approach
|
|
However a friend of mine had the solution, the idea is to have two
|
|
[zpool](https://en.wikipedia.org/wiki/ZFS)s. one, unencrypted that holds the OS
|
|
and the other encrypted that holds data.
|
|
|
|
Both of the zpools are in
|
|
[raid1](https://en.wikipedia.org/wiki/Standard_RAID_levels), meaning they are
|
|
mirrored to two physical disks, hence as long as both disks don't fail together,
|
|
we won't have any problems.
|
|
|
|
```
|
|
disk1
|
|
+------------------------+
|
|
| pool1| pool2 |
|
|
| unenc| enc |
|
|
+------------------------+
|
|
disk2
|
|
+------------------------+
|
|
| pool1| pool2 |
|
|
| unenc| enc |
|
|
+------------------------+
|
|
```
|
|
|
|
Roughly this how it works: When machine boots, it boots off the plain zpool, and
|
|
gets to the custom rc.script `geli0` installed by us
|
|
|
|
```
|
|
#!/bin/sh
|
|
#
|
|
|
|
# PROVIDE: geli0
|
|
# BEFORE: disks
|
|
# REQUIRE: initrandom
|
|
# KEYWORD: nojail
|
|
|
|
. /etc/rc.subr
|
|
|
|
name="geli0"
|
|
start_cmd="geli0_start"
|
|
stop_cmd=":"
|
|
required_modules="geom_eli:g_eli"
|
|
|
|
geli0_start()
|
|
{
|
|
zfs mount -av
|
|
/etc/rc.d/hostid start
|
|
/etc/rc.d/hostname start
|
|
/etc/rc.d/netif start
|
|
/etc/rc.d/routing start
|
|
/etc/rc.d/sshd start
|
|
|
|
echo -n "Waiting for zpool:encrypted to become available, "
|
|
echo -n "press enter to continue..."
|
|
echo
|
|
|
|
while true; do
|
|
if [ -e /dev/ada0p4.eli -a -e /dev/ada1p4.eli ]; then
|
|
break
|
|
fi
|
|
read -t 5 dummy && break
|
|
done
|
|
/etc/rc.d/sshd stop
|
|
pkill sshd
|
|
/etc/rc.d/routing stop
|
|
/etc/rc.d/netif stop
|
|
# /etc/rc.d/devd stop
|
|
}
|
|
|
|
load_rc_config $name
|
|
run_rc_command "$1"
|
|
```
|
|
|
|
This script pauses the boot, setups up some essential services related to
|
|
`network`, `ssh` and waits for the second set of disks to be available. The
|
|
machine is essentially waiting for me to decrypt the disks, and I can do that by
|
|
ssh-ing to the box and running `decryptvol.sh` (contents below)
|
|
|
|
```
|
|
#!/bin/sh
|
|
|
|
#
|
|
# The passphrase for both disks is the same.
|
|
# Read it once and decrypt the disks.
|
|
#
|
|
|
|
set -e
|
|
|
|
echo -n "Enter passphrase: "
|
|
stty -echo
|
|
IFS="" read -r passphrase
|
|
stty echo
|
|
echo
|
|
|
|
echo $passphrase | geli attach -k /boot/keys/ada1p4.key -j - /dev/ada1p4
|
|
echo $passphrase | geli attach -k /boot/keys/ada0p4.key -j - /dev/ada0p4
|
|
```
|
|
|
|
As soon as the disks are available the `geli0` scripts resumes regular boot, but
|
|
now with access to encrypted data.
|
|
|
|
## Conclusion and part 2
|
|
With this setup I have a place to store my data and its secure from data mining
|
|
by third party service providers. One bit that worries me is that someone can
|
|
coerce hetzner to attack the hardware itself, but I am not sure its something I
|
|
can solve at the moment.
|
|
|
|
However this is only a part of the puzzle. Strictly speaking I have my data
|
|
platform so as to speak, and now I need services that integrates this with other
|
|
devices that generate and consume data. This post is already longer than I
|
|
anticipated, so I will write about software and other services in a follow up.
|