Safer

The [SAFER] project

Welcome to the [SAFER] Project Research & Development Page!

[SAFER] is a system that combines a Network Attached Storage device and a Peer-to-peer backup service.

The following diagram shows what [SAFER] is all about in one big picture (read below for more explanations!)

What makes [SAFER] different? After all, there exists quite a number of NAS solutions, ranging from “professional” systems like Synology ™ or NetApp ™ to Open Source projects like FreeNAS, NAS4Free or OpenMediaVault, and P2P backup have been around for a while, with systems like Wuala, Storj, and many more. Besides, there exist a plethora of cloud-based backup solutions, and some of them are not so expensive: CrashPlan, BackBlaze, OVH, …

Well, the issue with all those systems and solutions is that for residential use… They still have issues!

Let’s try to list the most important of them:

  • Professional NAS solutions are very reliable, but that level of quality comes at a cost.
  • Open Source NAS Projects still require some expertise
  • P2P Backup systems are using lots of bandwidth resource
  • Cloud-based solutions require you to send them all your data, which results in privacy and safety concerns
  • On-line backup solutions often require one subscription per computer, or higher price for the whole household, and most of them do not accept NAS level backup
  • Online Backup solutions are often all or nothing (you restore all your data when you experience a crash)

Not all existing systems have all the previous issues, but none of them, to our knowledge, is exempt of all at once.

The lack of a perfect solution is therefore the motivation for this (yet another) new generation system. Our design goals are the following:

  • Low cost hardware (current target: under 100 euros per Tera-Byte of storage)
  • Low cost service (at least 10 times cheaper than cheapest cloud service)
  • User friendly (plug and play, no stupid question, zero-admin, operate with a TV remote)
  • Social friendly (share your resources and others will share with you for a free or reduced service cost)
  • Bandwidth friendly (network bandwidth consumption can be adjusted depending on time of day)
  • Bundle a NAS with a P2P backup (serves the needs of all the family household)
  • No single point of failure (backup data are spread on hundreds of peers, with lots of redundancy and strong incentive to stay on-line without interruption)
  • Preserve privacy (data are source crypted and spread on many peers, such that no one else can intercept them)
  • Multi-level backups (the NAS can be used for local backup of residential devices, and the backups can be backed up using P2P)
  • Virtually indestructible NAS (if your hardware device fails, it can easily be restored from the Peers, after replacing the failed parts)
  • Incremental backup (backup only the data that did not exist or where modified since the last backup)
  • Versioning (allows to navigate and retrieve backup in time, like Time Capsule ™)
  • Deduplication (Redundant copies of a file are detected and backed up only once)
  • Compression (Transparent compression of data such that more can be saved on disks)

Architecture

The [SAFER] NAS device

The NAS device is connected to the residential network, where it both acts as a network storage and a multimedia server.

Our recommended NAS hardware is no more than a standard, low cost X86_64 PC platform with SATA III support.

64bits ARM platforms can also be used but the cheapest ones, such as RPi and derivatives lack SATA III.

From a hardware point of view, the NAS device is a standard PC, equipped with a network connection (preferably 1Gb/s), and hard-drive(s). This is only requirement for a “pure” NAS configuration. In particular, it should be noted that since we aim at coupling this hardware with a reliable and high-performance backup service, the hardware itself does not have to meet highest reliability requirement, a desired property of most commercial NAS systems that justifies their higher cost.

For a more advanced “HTPC” copnfiguration, in which the NAS is directly connected to a TV or a Home-Cinema, the PC is expected to have good audio/video capabilities and connections. In the later case, for aesthetics reasons, the PC should have a small form factor and look nice.Last, but not least, given the device has to be on-line without interruption, it should preferably have low energy consumption, which has the nice side effect of producing low noise levels. Finding a configuration that meets such requirements requires a bit a research, but is perfectly feasible.

Lab: 16 experimental NAS configs under construction

Our current experimental HTPC platform is an example. It is composed of the following elements and costs less than 400 Euros:

  • AsRock J3160 mini-ITX motherboard (fanless, low TDP)
    • Intel LOW TDP chipset, well supported by Linux
    • 4 x SATA internal connectors
    • 1 x PCIExpress connector
    • 6 USB3, display port + HDMi + DVI, 7.1 Audio
  • InWIN BM639 mini-PC case
    • Small form factor
    • Room for 2×3.5” HDD + 1xslim ODD + 1×2.5” HDD
    • Includes a 160W FlexATX Power
    • NB: the case fan can safely be disconnected, given the very low thermal dissipation. In this case, the noisiest elements of the configuration are the harddrives (rotation vibrations)
  • 2 x 2To “NAS” harddrives
  • 2 x 2Go SO-DIMM SDRAM
  • Additional options tested:
    • Slim ODD (DVD/BluRay)
    • Internal WiFi 802.11AC/Blutootth 4

The [SAFER] Service

The [SAFER] backup service uses a Peer-to-Peer approach to ensure a high level of backup reliability, at a dramatically low cost, but with high level of performance and privacy.

The service needs a (modest) initial subscription, to be renewed yearly. This subscription serves several purposes: pay for basic infrastructure costs, assign credentials to users, and avoid “spam” traffic. The [SAFER] system may either be found preinstalled or downloadable as a Linux specialized distribution (Ubuntu-based), for the braves.

Subscribed users can further proceed to their dedicated download area, where they will find a bundle archive containing their P2P connection credentials and initialization data. Once the initialization bundle is installed on the NAS, the system is ready. No tuning is required by default. However, some adjustments can be made for comfort, such as setting the amount of bandwidth that should be used depending on the time of day.

The effective (observed by the system) amount of upload bandwidth made available for other peers is critical: it defines the amount of free backup storage that you can get from the system. Indeed, [SAFER] will not allow a customer with very little bandwidth to save large amounts of data: saving data for others makes sense only if others can retrieve their data in a reasonable amount of time. The sharing figures are as follows: the upload bandwidth unit (UBU) is 100Kbits/s. A permanent availability of 1x UBU on average for 10 consecutive days, allows a customer to contribute 1 TB of its local storage for other Peers during the next 10 days period. The average is computed on a daily basis, but for a given time of the day, it has to be stable for the whole 10 days period. For example a cutomer may decide to offer 6xUBUs during the night and office hours, when nobody uses their residential network, and none during the evening and mornings when everybody is busy on Internet at home. If for some reason, the observed bandwidth is only 5xUBU for a couple days from 4 to 6 AM, then the retained offered bandwidth is 5xUBU for the whole 10 days period that contains this couple days. In the end, if the daily average availability is 4 UBUs per day, then this customer is allowed to contribute at most 4TB of its storage for others.

The amount of storage that is actually contributed depends on the demand and it is also evaluated on a 10 days slot basis. If the system was online and passed to continuous integrity checks, then the customer earns storage credits. Storage credits can then be used to purchase storage from other peers. In case a customer does not have enough credits, two options are proposed:

  • purchase credits from the [SAFER] operator
  • drop some of its backup data.

To help dealing with the latter case, [SAFER] lets customer assign priority to their backups. Indeed, the NAS space is divided in volumes each having a dedicated purpose: 1st level backups, videos, photos, etc, and each volume is assigned a priority level, namely “KEEP” or “DROP(n)” where defines a relative priority. A volume tagged “KEEP” will never be dropped, while a volume tagged DROP(i) will be dropped once all volumes tagged DROP(j) with j>i have been dropped.

Since being able to contribute storage for others is critical to help reduce costs, [SAFER] implements fair-share algorithms, such that all peers get a fair chance to save for other peers, and therefore reduce their own costs.

[SAFER] uses a certain level of redundancy, to compensate for the potential loss of peer. The actual amount of redundancy is still subject to more research, but it should be somewhere around 150%, eg. using a 10+5 Reed Solomon encoding scheme (In comparison, Backblaze, for example, uses a much more agressive 17+3 scheme). This redundancy has to be accounted for in the storage budget. For example, in order to save 1TB of data on peers, a customer needs to contribute at least 1.5TB of its own storage, or purchase the missing credits from [SAFER].

The storage credits sold by [SAFER] result in a much cheaper storage than the equivalent storage purchased from major Cloud companies, such as Amazon or Microsoft. Indeed, most of time, [SAFER] will use the internal resources available from peer, in exchange of a retribution (eg. credits or cash-back). In case the internal resources available from peers are not sufficient, [SAFER] will have to purchase storage from traditional cloud operators, using virtual peers (peers running on virtual machines in the cloud). The cost of the storage credit may therfeore fluctuate, depending of the available resources.

Design topics

Research problems

The [SAFER] project raises a fair amount of research problems. Here is a non-exhaustive list of the one beiing currently addressed:

  • What network structure (topology, protocols) should be used for connecting peers ?
  • How to distribute data blocks among peers ?
  • What security measures/protocols should be used ?
  • How to monitor the effective bandwidth contributed by peers ?
  • How to protect the credits from forgery ?
  • What are the best redundancy parameters ?
  • What are the best networking configuration (routing, packet size, protocols, …)
  • How ensure fairness ?
  • How to assess the reliability of the proposed solution ?

Engineering problems

[SAFER] aims at maximizing the reuse of existing software and tools:

  • use IPv6 for building and rounting in a custom overlay network
  • use Docker to build reduce the vulnerability and improve the flexibility of the software stack
  • implement reliable data-channel capable of NAT traversal
  • reuse existing Linux distributions and packaging support (final decision to be made)
  • use ZFS for advanced file system operation (incremental backups, thin provisioning, stripping, …)
  • reuse existing services (Plex, Netflix, …)
  • Implement KISS and DRY principles using a robust framework (Python/Django, RoR, Pyramid, …TBD))
  • Build certificate authority using OpenSource libs (OpenSSL?)
  • Deduplication: deduplicating is quite easy on a unix/linux box. The issue is more about re-duplicating when a copy is edited (ideally using a CoW technique). A simpler option in our case to start with, is to not allow the modification on deduplicated file.

Project status

Project Openings and Fundings

The [SAFER] project is on track for industrial transfer in the next coming months. If you are a talented engineer or researcher, please do not hesitate to send a CV and references.

Last but not least: The project is currently proudly but modestly supported but University of Nice Sophia Antipolis.

We look for investors!. Be smart, this is the of future residential storage!! :-)

Roadmap and Achievements

We’re currently in a prototyping phase for patent(s) application.

Here is a list of issues we succesfully addressed so far:

  • Transparent compression: x1.4 factor on a fully automated 1 TB TimeMachine backup (real data). Zero config: simply add the \safer NAS to your TimeMachine targets and hop it goes! After completion, the full backup of 950GB only uses 727GB of actual space on the NAS.
  • Deduplication: 1.5 factor on a large real data set (a 400 GB photo library: with found and deduplicated up to 17 duplicates of the same images without any loss)
  • No manual setup: Zero config solution for traversing NAT. It works great. No need to tweek your routers, open ports or whatsoever. Our solution punches holes and establishes direct and safe Peer-to-peer connections from behind your home NAT! Fast and secure.

Research

Teaching

edit SideBar

Blix theme adapted by David Gilbert, powered by PmWiki