Founsure 1.0
PUBLICATIONs
Publication information is available in Elsevier SoftwarX web page here.
INTRODUCTION
Founsure 1.0 is the very first version of a totally new erasure coding library using Intel's SIMD and multi-core architectures based on simple aligned XOR operations and fountain coding. The details of fountain coding is beyond the scope of this text. I encourage you to take a look at here. Founsure does not use specific SIMD instructions, it rather pays attention to memory alignment and leaves the assembly instructions to be optimed by the "magical" gcc/g++ compiler (with appropriate optimization flags). This library is expected to have comparable performance to other open source alternatives and yet shall provide distinguishably better repair and rebuilt performance which seem to be the main stream focus of erasure coding community nowadays. The version 1.0 has only two major functions "founsureEnc" and "founsureDec" for encoding and decoding, respectively. In other words, the function "founsureRep" for efficient repair is still missing. So the library is not COMPLETE and the documentation is probably not at a fulfilling stage at the moment. Given the time constraint i am doing my best to get them upto date as much as i can.
Here are the reasons why "Founsure" is developped and exists:
■ Founsure is based on fountain codes which are not space-optimal i.e., non-zero coding overhead. However, if you want your erasure code (even if you use space-optimal codes such as Reed-Solomon codes) to have efficient repair capabilities, it is unavoidable to have non-zero coding overhead. So some overhead is actually necessary for storage applications.
■ Founsure is exposed in non-sytematic form and thus natural encryption is embedded inside the code (supported by seeds).
■ Founsure is flexible i.e., if one wants to have a combination of erasure coding and replication, it is possible with founsure to have that by just setting the right parameters. Ofcourse this would require experience and familarity with the internals of the theory & library.
This project is currently supported by government-based TUBITAK 2232 project funding and we expect EU funding to step in as well in the near future.
Founsure 1.0 implements an LT code and a concatenated fountain code (a precode + LT code). More specifically, the precode is based on a systematic binary array codes whose parity check matrix exhibits sparseness as the block length gets large. The precode is concatenated by an LT code based on different degree and selection distributions. More precode support shall be provided in future releases. Also custom degree/selection distributions will be part of the library for flexibility reasons.
The online home for Founsure 1.0 is here. The documentation is not available yet, but it should be here when it is. We provide a summary of the documentation below for now.
DOCUMENTATION
There are four major functions of Founsure: encoding, decoding, repair and update.
There are also utility functions provided with the library to better use the library for different use cases. For instance, simDisk utility function has nothing to do with a user file and can be used to help system admins provide required reliability with their own system.
■founsureEnc:
This function takes a file and produces multiple segments of data. The file itself does not appear in any of these segments.-
■founsureDec:
- This function assumes a ./Coding directory and a subset of encoded file segments to be available and decodes and re-generates the original file.
-
■simDisk:
- This utility function can be used to find fault tolerance of Founsure in terms of the number of failed disks that can be tolerated. This function is of great value to map founsure parameters such as k, n to number of disks and numer of tolerable disk failures and provide a way to understand fault tolerance numbers as in MDS codes (such as Jerasure or Intel's ISA libraries which are based on RS code implementions).
- DOWNLOAD
Founsure is available on GitHub on github/founsure
INSTALLATION
Installation from the source: There are four directories of source code:
* The src directory contains the founsure source code.
* The examples directory contains the example programs.
* The man page for the entire founsure software.
* The scripts page for running tests.
If you do not have autotools installed, you can simply install it using:
sudo yum install autotools autoconf (for centos), sudo apt-get install autotools-dev autoconf (for ubuntu)
Make sure you run autreconf before you run the usual installation procedure outline below.
■ autoreconf --force --install
■ ./configure
■ make
■ sudo make install
This will install the library into your machine's lib directory, most probably /usr/local/lib.
The configuration process assumes shared objects are searched for in /usr/local/lib. If this is not the case on your system, you can specify a search path at configuration time. Alternatively if you receive the following error when you try to execute one of the main functions of the library to do encoding or decoding, setting the LD_LIBRARY_PATH to installed directory (For instance /usr/local/lib) and exporting it might help. Otherwise, let me know the specifics of the problem, I will give my own shot.
-> error while loading shared libraries: libfounsure.so.2: cannot open shared object file: No such file or directory
PERFORMANCE (coding overhead)
I have only tested the library for few cases that is related to coding overhead . So more extensive evaluations are needed and will be published when i find time. Let me know if you want to give me a hand.
Experimentation parameters: Let us use a file of 100MiB and 10 disks over which we intend to distribute data. (n,k) parameters of the system with the appropriate precode and LT degree distributions are as shown for each table. Note that the parameter t does not effect the overhead performance. There is also a seed number that effects the performance (which is predefined for now to be 1389488782). Columns with integer headings show the number of lost disk data and the performance measures the number of combinations of that number lost disk data. The rate of the code is an indicator of the overhead. In this experiment, we intend to recover half of the lost data so the ideal rate is 0.5 (the overhead optimum). The non-optimality of the founsure 1.0 diminishes as the parameters (n,k) get larger as shown. However. the price we pay is the complexity of encoding and decoding operations.
Fountain Code | n | k | Rate | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|---|---|
FiniteRaptor+ArrayLDPC | 2180 | 1036 |
0.4752 | 0/45 | 0/120 | 5/210 | 252/252 | All Comb. |
RSD | 2180 | 1024 | 0.469 | 0/45 | 0/120 | 2/210 | 222/252 | All Comb. |
Fountain Code | n | k | Rate | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|---|---|
FiniteRaptor+ArrayLDPC | 21800 | 10246 |
0.47 | 0/45 | 0/120 | 0/210 | 2/252 | All Comb. |
RSD | 21800 | 10240 | 0.47 | 0/45 | 0/120 | 0/210 | 17/252 | All Comb. |
Fountain Code | n | k | Rate | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|---|---|
FiniteRaptor+ArrayLDPC | 43600 | 20497 |
0.4701 | 0/45 | 0/120 | 0/210 | 0/252 | All Comb. |
RSD | 43600 | 20480 | 0.4697 | 1/45 | 11/120 | 100/210 | 222/252 | All Comb. |
Note that the last parameter selection leads to tolerance against all-five combinations. The price being paid using fountain codes is the coding overhead i.e., 0.03 or 3% rate loss relative to the optimal (which is 0.5 for our example). As (n,k) gets large the complexity of the encoding and decoding increase. If we desire to have faster speeds, we can simply pay off by allowing more overhead penalty. The following table assumes smaller (n,k) and yet provide the same level of partition tolerance at the expense of more overhead.
Fountain Code | n | k | Rate | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|---|---|
FiniteRaptor+ArrayLDPC | 2900 | 1036 |
0.3572 | 0/45 | 0/120 | 0/210 | 0/252 | 168/210 |
RSD | 2900 | 1045 | 0.3603 | 0/45 | 0/120 | 1/210 | 11/252 | 64/210 |
Note that this selection provides a tolerance level beyond all-five combinations! The coding overhead can be expressed as 14% rate loss which is way higher than previous 3%. However, the complexity is dramatically reduced.
PERFORMANCE (complexity)
Various performance results are published and compared to some of the most competetive softwares available such as Jerasure, ISA-L and Z-fec. Please see the paper.