Safer

The [SAFER] project

Welcome to the [SAFER] Project Research & Development Page!

[SAFER] is a system that combines a Network Attached Storage device and a Peer-to-peer backup service.

The following diagram shows what [SAFER] is all about in one big picture (read below for more explanations!)

As illustrated by the diagram, the NAS device acts as a concentrator for the data of the whole household: photos, videos, scans, documents, music, and so on. Furthermore it can be directly plugged to a TV, which provides a friendly interface as well as multimedia display. The peer-to-peer backup service running on the NAS offers a bullet-proof mechanism in case something unexpected happens to this outrageously obvious single point of failure. The combination of both results in a virtually indestructible shared network device for the whole family. It is meant to be cheap, but can still last for ever.

What makes [SAFER] different? After all, there exists quite a number of NAS solutions, ranging from “professional” systems like Synology ™ or NetApp ™ to Open Source projects like FreeNAS, NAS4Free or OpenMediaVault, and P2P backup have been around for a while, with systems like Wuala, Storj, and many more. Besides, there exist a plethora of cloud-based backup solutions, and some of them are not so expensive: CrashPlan, BackBlaze, OVH, …

Well, the issue with all those systems and solutions is that for residential use… They still have issues!

Let’s try to list the most important of them:

  • Professional NAS solutions are very reliable, but that level of quality comes at a cost.
  • Open Source NAS Projects still require some expertise
  • P2P Backup systems are using lots of bandwidth resource
  • Cloud-based solutions require you to send them all your data, which results in privacy and safety concerns
  • On-line backup solutions often require one subscription per computer, or higher price for the whole household, and most of them do not accept NAS level backup
  • Online Backup solutions are often all or nothing (you restore all your data when you experience a crash)

Not all existing systems have all the previous issues, but none of them, to our knowledge, is exempt of all at once.

The lack of a perfect solution is therefore the motivation for this (yet another) new generation system. Our design goals are the following:

  • Low cost hardware (current target: under 100 euros per Tera-Byte of storage)
  • Low cost service (at least 10 times cheaper than cheapest cloud service)
  • User friendly (plug and play, no stupid question, zero-admin, operate with a TV remote)
  • Social friendly (share your resources and others will share with you for a free or reduced service cost)
  • Bandwidth friendly (network bandwidth consumption can be adjusted depending on time of day)
  • Bundle a NAS with a P2P backup (serves the needs of all the family household)
  • No single point of failure (backup data are spread on hundreds of peers, with lots of redundancy and strong incentive to stay on-line without interruption)
  • Preserve privacy (data are source crypted and spread on many peers, such that no one else can intercept them)
  • Multi-level backups (the NAS can be used for local backup of residential devices, and the backups can be backed up using P2P)
  • Virtually indestructible NAS (if your hardware device fails, it can easily be restored from the Peers, after replacing the failed parts)
  • Incremental backup (backup only the data that did not exist or where modified since the last backup)
  • Versioning (allows to navigate and retrieve backup in time, like Time Capsule ™)
  • Deduplication (Redundant copies of a file are detected and backed up only once)
  • Compression (Transparent compression of data such that more can be saved on disks)

Architecture

The [SAFER] NAS device

The NAS device is connected to the residential network, where it both acts as a network storage and a multimedia server.

Our recommended NAS hardware is no more than a standard, low cost X86_64 PC platform with SATA III support.

64bits ARM platforms can also be used but the cheapest ones, such as RPi and derivatives lack SATA III.

From a hardware point of view, the NAS device is a standard PC, equipped with a network connection (preferably 1Gb/s), and hard-drive(s). This is only requirement for a “pure” NAS configuration. In particular, it should be noted that since we aim at coupling this hardware with a reliable and high-performance backup service, the hardware itself does not have to meet highest reliability requirement, a desired property of most commercial NAS systems that justifies their higher cost.

For a more advanced “HTPC” copnfiguration, in which the NAS is directly connected to a TV or a Home-Cinema, the PC is expected to have good audio/video capabilities and connections. In the later case, for aesthetics reasons, the PC should have a small form factor and look nice.Last, but not least, given the device has to be on-line without interruption, it should preferably have low energy consumption, which has the nice side effect of producing low noise levels. Finding a configuration that meets such requirements requires a bit a research, but is perfectly feasible.

Lab: 16 experimental NAS configs under construction

Our current experimental HTPC platform is an example. It is composed of the following elements and costs less than 400 Euros:

  • AsRock J3160 mini-ITX motherboard (fanless, low TDP)
    • Intel LOW TDP chipset, well supported by Linux
    • 4 x SATA internal connectors
    • 1 x PCIExpress connector
    • 6 USB3, display port + HDMi + DVI, 7.1 Audio
  • InWIN BM639 mini-PC case
    • Small form factor
    • Room for 2×3.5” HDD + 1xslim ODD + 1×2.5” HDD
    • Includes a 160W FlexATX Power
    • NB: the case fan can safely be disconnected, given the very low thermal dissipation. In this case, the noisiest elements of the configuration are the harddrives (rotation vibrations)
  • 2 x 2TB 4TB T”NAS” harddrives
  • 2 x 2GB SO-DIMM SDRAM
  • Additional options tested:
    • Slim ODD (DVD/BluRay)
    • Internal WiFi 802.11AC/Blutootth 4

The [SAFER] Service

The [SAFER] backup service uses a Peer-to-Peer approach to ensure a high level of backup reliability, at a dramatically low cost, but with high level of performance and privacy. In a peer-to-peer system, all parties act both as a provider and a client.

Under the hood, for this to work, the system has to ensure fairness: each participant gets free service in proportion to the service they contribute to others. In other words, the system prevents cheaters to get free service at the expense of others (same as leechers in BitTorrent). This fair policy is at the heart of the [SAFER] design, with a very simple, but proven economical approach: the participants that provides service to others gets rewarded in proportion to their contribution; conversely, the participants that need to use the service are charged in proportion to their collective resource usage. In between, we use a virtual crypto-currency as means for exchanging service between parties.

Internally this economical system is operated automatically, in the background, using the latest bullet-proof technologies for distributed transactions: block-chains. Block-chains have the nice property to make all transactions irreversible (which does not mean it is impossible to reverse the effects of a transaction): once a transaction is recorded, it is forever, it becomes impossible to erase it.

The service needs a (modest) initial subscription, to be renewed yearly. This subscription serves several purposes: pay for basic infrastructure costs, assign credentials to users, and avoid “spam” traffic. The [SAFER] system may either be found preinstalled or downloadable as a Linux specialized distribution (Ubuntu-based), for the braves.

Subscribed users can further proceed to their dedicated download area, where they will find a bundle archive containing their P2P connection credentials and initialization data. Once the initialization bundle is installed on the NAS, the system is ready. No tuning is required by default. However, some adjustments can be made for comfort, such as setting the amount of bandwidth that should be used depending on the time of day.

To help users deal with budget constraints, [SAFER] offers two levels of quality of service:

  • premium service is designed to ensure that the restore operation will complete in a bounded time. Our current figures are to ensure that a volume of 1TB can be restored in 10 days maximum.
  • best effort service is much cheaper, but it offers no delay bounds. However, it still ensures the safety of the data.

How to select the service? No panic, [SAFER] promotes the convention over configuration approach: users are never required to tweak their system parameters, because by default, the system already comes with reasonable configuration. Still, it is always possible to change the default settings. The default configuration is based on the default organization of the data on the NAS: the stoarge space on the NAS is divided in multiple area, depending on the data: photos, videos, documents, music, backup, and so on. The default quality of service is assigned by default depending on the area: premium documents and photos, best effort for videos and backups.

Each user can chose the amount of bandwidth they are willing to contribute. However, the amount of premium storage that one can contribute is linked to the amount of bandwidth made available to others. Indeed, in order to enforce delay bounds, the system needs to ensure that the corresponding network bandwidth requirement can be met. And therefore, the systems has to put a limit on the amount of premium storage that can be sold, depending on the amount of bandwidth that is contributed.

Since being able to contribute storage for others is critical to help reduce costs, [SAFER] implements fair-share algorithms, such that all peers get a fair chance to save for other peers, and therefore reduce their own costs.

[SAFER] uses a certain level of redundancy, to compensate for the potential loss of peer. The actual amount of redundancy is still subject to more research, but it should be somewhere around 150%, eg. using a 10+5 Reed Solomon encoding scheme (In comparison, Backblaze, for example, uses a much more agressive 17+3 scheme). This redundancy has to be accounted for in the storage budget. For example, in order to save 1TB of data on peers, a customer needs to contribute at least 1.5TB of its own storage, or purchase the missing credits from [SAFER].

The storage credits sold by [SAFER] result in a much cheaper storage than the equivalent storage purchased from major Cloud companies, such as Amazon or Microsoft. Indeed, most of time, [SAFER] will use the internal resources available from peer, in exchange of a retribution (eg. credits or cash-back). In case the internal resources available from peers are not sufficient, [SAFER] will have to purchase storage from traditional cloud operators, using virtual peers (peers running on virtual machines in the cloud). The cost of the storage credit may therfeore fluctuate, depending of the available resources.

Design topics

Research problems

The [SAFER] project raises a fair amount of research problems. Here is a non-exhaustive list of the one beiing currently addressed:

  • What network structure (topology, protocols) should be used for connecting peers ?
  • How to distribute data blocks among peers ?
  • What security measures/protocols should be used ?
  • How to monitor the effective bandwidth contributed by peers ?
  • How to protect the credits from forgery ?
  • What are the best redundancy parameters ?
  • What are the best networking configuration (routing, packet size, protocols, …)
  • How ensure fairness ?
  • How to assess the reliability of the proposed solution ?

Engineering problems

[SAFER] aims at maximizing the reuse of existing software and tools:

  • use IPv6 for building and rounting in a custom overlay network
  • use Docker to build reduce the vulnerability and improve the flexibility of the software stack
  • implement reliable data-channel capable of NAT traversal
  • reuse existing Linux distributions and packaging support (final decision to be made)
  • use ZFS for advanced file system operation (incremental backups, thin provisioning, stripping, …)
  • use reliable distributed transactions (Distributed DB or blockchain)
  • reuse existing services (Plex, Netflix, …)
  • Implement KISS and DRY principles using a robust framework (Python/Django, RoR, Pyramid, …TBD))
  • Build certificate authority using OpenSource libs (OpenSSL?)
  • Deduplication: deduplicating is quite easy on a unix/linux box. The issue is more about re-duplicating when a copy is edited (ideally using a CoW technique). A simpler option in our case to start with, is to not allow the modification on deduplicated file.

Project status

Project Openings and Fundings

The [SAFER] project is on track for industrial transfer in the next coming months. If you are a talented engineer or researcher, please do not hesitate to send a CV and references.

Last but not least: The project is currently proudly but modestly supported but University of Nice Sophia Antipolis.

We look for investors!. Be smart, this is the of future residential storage!! :-)

And partners. Especially cloud operators….

Roadmap and Achievements

We’re currently in a prototyping phase for patent(s) application.

Here is a list of issues we succesfully addressed so far:

  • Transparent compression: x1.4 factor on a fully automated 1 TB TimeMachine backup (real data). Zero config: simply add the \safer NAS to your TimeMachine targets and hop it goes! After completion, the full backup of 950GB only uses 727GB of actual space on the NAS.
  • Deduplication: 1.5 factor on a large real data set (a 400 GB photo library: with found and deduplicated up to 17 duplicates of the same images without any loss)
  • No manual setup: Zero config solution for traversing NAT. It works great. No need to tweek your routers, open ports or whatsoever. Our solution punches holes and establishes direct and safe Peer-to-peer connections from behind your home NAT! Fast and secure.
  • Ubuntu derived distribution: repo server is up (but still hidden)
  • Blockchain: deploying test infrastructure

Research

Teaching

edit SideBar

Blix theme adapted by David Gilbert, powered by PmWiki