Sauvegarde 0.0.12

Sauvegarde saves your data in a continuous way that is to say while being written to disk. It is intended for users that have many files to save and that do not want to miss any change on them. The whole backup is deduplicated so users using many similar systems or having copies of the same file in many systems will save space regarding traditional backup. A specific feature (caching in the client) yet to implement will allow one to backup systems that do not have permanent access to the network (for instance laptop users).

Tags c system continuous data protection backup deduplicating stateless json jansson http libmicrohttpd libcurl
License GNU GPLv3
State beta

Recent Releases

0.0.1210 Mar 2019 14:54 major feature: v0.0.12 Continuous Data Protection For GNU/Linux (cdpfgl) is also known as 'sauvegarde' project. It is a set of programs: 'cdpfglserver', 'cdpfglclient' and 'cdpfglrestore' as of now. These programs will save your files in a live continuous way that is to say while they are written to disks. One interesting thing is that the server 'cdpfglserver' is stateless and achieves deduplication at the block level (optionally with adaptive block size). As a result it does not use much memory and may run on small machines (for instance I ran one cdpfglserver on a 1Gb bananapi). This is v0.0.12 release of this project and one major feature was added that makes that version not backward compatible. Please let me know that you use this project by sending me an email so that I'll take backward compatibility into account. * New feature: * Now the client compresses (or not) the data and sends it to the server. The server stores data as they are transmitted (compressed or not). * -z, --compression command line options added to cdpfglclient and compression-type=x where x can be 0 (none) or 1 (zlib) option to client configuration file. * Improved parts: * Code refactoring and cleaning improved the code - Expect more code refactoring in the future. You are encouraged to contribute to this project - by example by saying to the author that you are using it. In order to do so you can open issues and pull requests on github. The project uses github's facilities such as milestones, issue and projects. Milestones are opened until v0.0.16 and some enhancement issues have been opened too.
0.0.1112 Sep 2017 19:34 major feature: This is v0.0.11 release of this project and some features were added, some were improved and a bug corrected: * New features: * -n, --hostname option added to cdpfglrestore program in order to restore a file from a host that did not owned the file originally. * messages answering a .json request are now json formatted. * url /Stats.json is now answering a json string giving some basic server usage statistics. * cdpfglserver now prints its listening port when in debug mode. * Improved parts: * examples are now included in man pages. * now cdpfglclient does local database versioning to ease future migrations or changes * code refactoring and cleaning has been done. * Bug corrected * commit 375003985549 corrected a bug that made the url statistics counted twice. You are encouraged to contribute to this project. In order to do so you can open issues and pull requests on github. The project uses github's facilities such as milestones, issue and projects. Milestones are opened until v0.0.16 and some enhancement issues have been opened too.
0.0.1006 Dec 2016 20:09 major feature: Man pages were added for cdpfglclient, cdpfglserver and cdpfglrestore. Code refactoring has been done by simplifying some command line related parts and by putting some generic code into libcdpfgl library. New options has been added to cdpfglrestore program: -f, --all-files now restores all selected files (in conjunction with -r option). -g, --latest option selects only the latest version of each selected file. -P, --parents option restores files with their full path (creates directories if needed). New option has been added to cdpfglclient program: -n, --no-scan now deactivates the initial scan of every directory to be saved when launching cdpfglclient. This option is a boolean value named no-scan in cdpfglclient configuration file. libcdpfgl is now built more cleanly and is also used in 'content-define-cut' project
0.0.924 Aug 2016 20:20 major feature: Documentation has been improved and code reviewed a bit (and simplified. Many bugs has been corrected. This version is completely incompatible with older ones. Signals are now trapped in cdpfglclient to allow correct shutdown of database's connections upon SIGINT and SIGTERM. A new url is available to clients in order to get a bigger chunck of data when restoring files (/Data/Hash_Array.json with GET method). A new option (-e, --all-versions) has been added that allows one to restore all versions of a specific file (to be used in conjunction with -l or -r).
0.0.829 Apr 2016 20:40 major feature: Now one can restore a file before, after and at a specific date. A new post URL (/Hash_Array.json) has been added in order to try to minimize network traffic when saving big files (and still doing live deduplication). SQL code has been reviewed a bit and modified to avoid SQL injection problems.
0.0.703 Jan 2016 16:55 major feature: The project is still named 'sauvegarde' in github but I will name it Continuous Data Protection For GNU/Linux (cdpfgl) as much as I can. All the programs are now using cdpfgl acronym (even the libsauvegarde as been renamed libcdpfgl). An effort as been made in the source code to track down old french names such as 'serveur', 'restaure' and so on. In the same way I renamed 'Serveur' section into 'Server' section in configuration files. So v0.0.6 or older configuration files are not compatible with v0.0.7 and one may have to change this manually. Ability to exclude some files by extension or path. This adds a new file configuration option named 'exclude-list=' in Client section. It takes some basic regular expressions such as those given in example in 'client.conf' file. It also adds a new cdpfglclient command line option called '-x' or '--exclude'. Client is threaded and now uses at least 3 threads (one is used only to uncache cached buffers when the server comes alive again). Caching mechanism in client in case the server is unreachable. This is achieved by using tables in a the client's local sqlite database. It breaks compatibility form older versions ie v0.0.6 database is not usable as is with v0.0.7 (I'll do a migration script upon request). Change GSList hash_data_list from meta_data_t structure to a GList structure that allows deleting elements while walking through it at 0(1) cost. I also corrected some bugs as they were found. Packaging directory now contains stuff for packaging for the distributions (voidlinux and debian as of now). Dockerfiles directory contains Dockerfiles to build the whole project in different distributions (centos and voidlinux) in a light way.
0.0.602 Nov 2015 21:29 major feature: Upon comments made by Pierre Bourgin when trying to package sauvegarde's project for voidlinux distribution the program names changed and are now cdpfglserver, cdpfglclient and cdpfglrestore. 'cdpfgl' stands for 'Continuous Data Protection For Gnu/Linux'. Now using cdpfglrestore with the -w (--where) option one can restore a file to some specific directory specified along with the option. cdpfglclient has a new mode to calculate hashs. This mode is called 'adaptative blocksize' it can be invoked with -a 1 (--adaptative=1) command line option or by using 'adaptative=true' directive in 'Client' section of client.conf configuration file. This option allows client to calculate hash with an adaptative blocksize that depends of the size of the file. It works by steps. File whose size is under 32768 bytes are hashed with a 512 bytes blocksize, files under 262144 bytes with a 2048 bytes blocksize and so on until files whose size is greater than 134217728 that are hashed with a 262144 bytes blocksize. It is believed that doing so, deduplication will hit a higher rate. The counter part is that cdpfglclient program is slower for small files. -s (--buffersize) option has been added to cdpfglclient program in order to let one choose the cache that cdpfglclient may use before sending data to cdpfglserver. This option has no effect when the adaptative blocksize option has been chosen has the program will adapt this buffersize roughly to each file. This release also contains many bugfixes and memory leakage fixes. Memory allocation strategy has been reviewed at some points: when it's possible, avoid allocating memory at all, when we must allocate see if g_malloc() is usable else use g_malloc0() which is 1000 times slower than g_malloc.
0.0.506 Oct 2015 19:48 minor feature: Some improvements were made as all what was expected in the roadmap (and more) has been coded: fanotify's code has been reviewed a bit and a thread has been created to process (if possible) files. It allows to begin file change notification very early, way before the end of the directory carving. Files that are 128 MB or more are processed differently to avoid having them completely in memory. The problem for now is that it does a naive transfer with the blocs: it sends everything thus it may waste network bandwidth and serveur and client CPU and IO. There is a simple solution to this that I will code in a future release (planned for v0.0.8). New test script used in travis-ci to tests things a bit and may be avoid basic problems. New Icon in pixmap directory available from 16x16 to 512x512 in CC-BY-SA license. The image is used in publications (there is no GUI as of now). Begun a user manual for sauvegarde's project in directory 'manual'. One may found development documentation in 'docs' directory. Added libsauvegarde.pc to allow one to use this library in an other project. Corrected 2 major mistakes that prevented client to be efficient when processing files (to avoid processing them when already processed). As a result client in v0.0.5 is slower than in v0.0.4 but saves a lot of IO and CPU on its running machine.
0.0.406 Sep 2015 20:04 minor feature: There is now a new server url to post a bunch of hashs and associated data (/Data_Array.json). The JSON string expected must contain an array named 'data_array'. Each object of this array must contain the three fields 'hash', 'data' and 'size'. Fields 'hash' and 'size' must be base64 encoded. This has the effect of buffering the communication a bit. My tests on my single computer showed that the gain of speed is at least 4 times. When sending the hashs of a file in it's meta data the server answers the hashs that it needs (unknown to him). But if the file has several times the same hash that is unknown the server was answering as many times to send this hash. Now, with v0.0.4 the answer has only unique hashs, avoiding the client to send several times the same block. file_backend has now a configuration section into the 'serveur.conf' file named file_backend '. Two options can be configured. An option named 'file-directory' that tells the backend where to put it's files and 'dir-level' that tells the backend the number of level we want to store datas. It's default value is 2 (it means that serveur will create 65536 directories). The value is limited to a maximum of 5 (ie 256 5 = 1 099 511 627 776 directories!). Keep in mind that creating the directories may last a long time if you choose a high value (It will only be done once) and also that a directory may take some space (on ext4 a level 2 takes 256 Mb but level 3 takes 64 Gb !). sauvegarde is now fully translated in French and is ready for other translations (it is based on .po files). Sébastien Tricaud patch was merge in this version adding the ability to catch SIGINT and clean the memory before exiting avoiding a memory leak. A manual has been created and is waiting for contributions at http://write.flossmanuals.net/sauvegarde-manual/_info/. TODO file has been reworked and contains new ideas that I might put in the roadmap.
0.0.327 Aug 2015 12:14 minor feature: - links are now saved and can be restored - A new test directory comes with the project where we might put some files, directories or links to test, improve and avoid regression on the project.