Experiments in owning data, part 1
This commit is contained in:
parent
cd17762593
commit
88091a4040
189
blog/2019-01-20-experiments-in-owning-data.markdown
Normal file
189
blog/2019-01-20-experiments-in-owning-data.markdown
Normal file
@ -0,0 +1,189 @@
|
||||
---
|
||||
layout: post
|
||||
title: "Experiments in owning data"
|
||||
date: 2019-01-20
|
||||
comments: true
|
||||
tags: freebsd
|
||||
---
|
||||
|
||||
I have been working for a while to own most of the data I generate. Thought I
|
||||
would write down what I mean by that and how I am doing so far.
|
||||
|
||||
Before this effort most of my data where spread across many proprietary
|
||||
services, some free, some paid. I had always felt I had restricted control over
|
||||
them, and I had to find out some free tier [restrictions the hard
|
||||
way](https://www.quora.com/What-happens-to-your-older-photos-once-you-go-over-the-free-200-limit-on-Flickr-without-turning-pro).
|
||||
|
||||
So this started as an effort to
|
||||
|
||||
1. organize all the things that I wanted in a central (virtual) place.
|
||||
2. and fine grain control over who has access to the data.
|
||||
|
||||
However all of the service that I looked into was made for exactly what I wanted
|
||||
to avoid - a free service that monetizes based on my personal data, they do take
|
||||
my money to provide "upgrades", and my data may not be mined (or maybe - no way
|
||||
to ensure that). Also service behaviors were sometimes opaque and confusing,
|
||||
even [causing people to loose
|
||||
data](https://productforums.google.com/forum/#!topic/apps/RIHSJ4LIXwE).
|
||||
|
||||
Another thing that stood out was how inflexible these services were. Mostly
|
||||
designed as big monoliths that does not play well with others. For e.g google
|
||||
photos is really nice - but what if I want to run an imagemagic script over all
|
||||
of the photos I have? I think there is someway to do this if you poke at the
|
||||
photos API, however the friction is too much compared to just mounting them over
|
||||
webdav or fuse. For a lot of these services Linux was a second class citizen,
|
||||
and FreeBSD an undiscovered species. I understand these are not common
|
||||
requirements, but I wanted the system to work with things I use and have.
|
||||
|
||||
## Hardware
|
||||
At the moment, what I call my personal data is ~500GB, that's all the pictures,
|
||||
emails documents, code and other things that I have. Assuming a 3 fold growth (probably too low?) I decided that I need around 2.5TB storage. Other requirements were,
|
||||
|
||||
1. connected to reasonably fast internet and reliable power
|
||||
2. cheap (remember, migrating out of this system is going to be really painful)
|
||||
|
||||
After some consideration I decided to not to host my hardware, I move around a
|
||||
lot and state of home internet in Germany is not where I'd like it to be.
|
||||
|
||||
Requirements for storage made most of the cloud providers unfeasible (_3TB EBS is
|
||||
~$350/month_).
|
||||
|
||||
I finally settled on a physical machine from hetzner [server
|
||||
auction](https://www.hetzner.com/sb). Server auction is where they sell their
|
||||
older generation machines (read: sandy bride/ivy bridge) at a steep discount. I
|
||||
was able to get a Xeon E3 with 32GB ECC ram and 2x3TB disks for 30 EUR a month.
|
||||
|
||||
It could have been a bit cheaper if had gone with an i7 machine (newer cpu too)
|
||||
instead. But they don't ECC RAM. Intel is very adamant in not supporting ECC in
|
||||
"desktop class" processors.
|
||||
|
||||
## Installation
|
||||
Installation was piece of cake, hetzner allows you to boot the server into
|
||||
`freebsd rescue mode` where they point server to PXE boot from a
|
||||
[`mfsbsd`](https://mfsbsd.vx.sk/) disk and lets you ssh, and then you can start
|
||||
installing `FreeBSD` (one can follow a similar procedure for Linux distros with
|
||||
a linux rescue image..)
|
||||
|
||||
## Security
|
||||
Even though the main goal is to avoid mass surveillance, I also wanted to avoid
|
||||
data leaks because of unplanned events - me not paying bills, hardware failures
|
||||
etc. The solution was to encrypt the disks, so that at rest nobody can sniff
|
||||
data out of them.
|
||||
|
||||
This became a challenge because getting access to KVM in hetzner environment is
|
||||
not instant. One need to send them a request and a human mails you kvm access
|
||||
creds for an hour (they are usually fast though). This is a challenge because
|
||||
every time I need to reboot the server I would need to get KVM access, type in
|
||||
my password over KVM (also not sure how much of that encryption I can trust..)
|
||||
and let the machine boot.
|
||||
|
||||
### Two Zpools approach
|
||||
However a friend of mine had the solution, the idea is to have two
|
||||
[zpool](https://en.wikipedia.org/wiki/ZFS)s. one, unencrypted that holds the OS
|
||||
and the other encrypted that holds data.
|
||||
|
||||
Both of the zpools are in
|
||||
[raid1](https://en.wikipedia.org/wiki/Standard_RAID_levels), meaning they are
|
||||
mirrored to two physical disks, hence as long as both disks don't fail together,
|
||||
we won't have any problems.
|
||||
|
||||
```
|
||||
disk1
|
||||
+------------------------+
|
||||
| pool1| pool2 |
|
||||
| unenc| enc |
|
||||
+------------------------+
|
||||
disk2
|
||||
+------------------------+
|
||||
| pool1| pool2 |
|
||||
| unenc| enc |
|
||||
+------------------------+
|
||||
```
|
||||
|
||||
Roughly this how it works: When machine boots, it boots off the plain zpool, and
|
||||
gets to the custom rc.script `geli0` installed by us
|
||||
|
||||
```
|
||||
#!/bin/sh
|
||||
#
|
||||
|
||||
# PROVIDE: geli0
|
||||
# BEFORE: disks
|
||||
# REQUIRE: initrandom
|
||||
# KEYWORD: nojail
|
||||
|
||||
. /etc/rc.subr
|
||||
|
||||
name="geli0"
|
||||
start_cmd="geli0_start"
|
||||
stop_cmd=":"
|
||||
required_modules="geom_eli:g_eli"
|
||||
|
||||
geli0_start()
|
||||
{
|
||||
zfs mount -av
|
||||
/etc/rc.d/hostid start
|
||||
/etc/rc.d/hostname start
|
||||
/etc/rc.d/netif start
|
||||
/etc/rc.d/routing start
|
||||
/etc/rc.d/sshd start
|
||||
|
||||
echo -n "Waiting for zpool:encrypted to become available, "
|
||||
echo -n "press enter to continue..."
|
||||
echo
|
||||
|
||||
while true; do
|
||||
if [ -e /dev/ada0p4.eli -a -e /dev/ada1p4.eli ]; then
|
||||
break
|
||||
fi
|
||||
read -t 5 dummy && break
|
||||
done
|
||||
/etc/rc.d/sshd stop
|
||||
pkill sshd
|
||||
/etc/rc.d/routing stop
|
||||
/etc/rc.d/netif stop
|
||||
# /etc/rc.d/devd stop
|
||||
}
|
||||
|
||||
load_rc_config $name
|
||||
run_rc_command "$1"
|
||||
```
|
||||
|
||||
This script pauses the boot, setups up some essential services related to
|
||||
`network`, `ssh` and waits for the second set of disks to be available. The
|
||||
machine is essentially waiting for me to decrypt the disks, and I can do that by
|
||||
ssh-ing to the box and running `decryptvol.sh` (contents below)
|
||||
|
||||
```
|
||||
#!/bin/sh
|
||||
|
||||
#
|
||||
# The passphrase for both disks is the same.
|
||||
# Read it once and decrypt the disks.
|
||||
#
|
||||
|
||||
set -e
|
||||
|
||||
echo -n "Enter passphrase: "
|
||||
stty -echo
|
||||
IFS="" read -r passphrase
|
||||
stty echo
|
||||
echo
|
||||
|
||||
echo $passphrase | geli attach -k /boot/keys/ada1p4.key -j - /dev/ada1p4
|
||||
echo $passphrase | geli attach -k /boot/keys/ada0p4.key -j - /dev/ada0p4
|
||||
```
|
||||
|
||||
As soon as the disks are available the `geli0` scripts resumes regular boot, but
|
||||
now with access to encrypted data.
|
||||
|
||||
## Conclusion and part 2
|
||||
With this setup I have a place to store my data and its secure from data mining
|
||||
by third party service providers. One bit that worries me is that someone can
|
||||
coerce hetzner to attack the hardware itself, but I am not sure its something I
|
||||
can solve at the moment.
|
||||
|
||||
However this is only a part of the puzzle. Strictly speaking I have my data
|
||||
platform so as to speak, and now I need services that integrates this with other
|
||||
devices that generate and consume data. This post is already longer than I
|
||||
anticipated, so I will write about software and other services in a follow up.
|
Loading…
Reference in New Issue
Block a user