How We Prevent Data Corruption
Because we don't mess around about protecting your files, we implemented multiple mechanisms into Duple to prevent any data corruption.
Since we don't use servers, every client can write to the storage when whenever it wants. There is nothing stopping it. If more than one client tries to modify the same resource at the same time, it can result in data corruption.
The file locking mechanisms available were too basic to prevent this problem in a production environment. So we created our own: a robust, time based, file locking mechanism that can orchestrate multiple concurrent access to a same resource. It handles time difference between devices and storage, deadlocks, network latency and much more.
Basically, you can write to your private cloud simultaneously from a lot of devices; it's completetly safe. We've actually tested the locking mechanism with 25 clients writing in the same file, at the same time. It doesn't break.
The storage architecture (of the Duple repository folder) works like a git repository (meaning it's a content addressable storage using blob, tree and commits), but it's not compatible with git and has nothing to do with the git software. Everything was built from scratch. We didn't use the git repository format because it was not made to handle big files, but source code.
The blobs are stored using their content's hash as a key. This way, it is impossible to overwrite anything and lose data. To go even further, the operations are atomic (the blobs are uploaded to a temporary location and checked before being moved to the content addressable storage). This prevents storing half uploaded data.
Simply speaking, even if the first line of defense breaks (the Locking Mechanism), your file's content is safe and will not be corrupted.
Journaling (Coming Soon)
A log of modifications will be kept to be able to recreate the mutable data of the repository folder. The blobs (the private cloud file's content) are immutable, in the sense that they can't be modified or corrupted. However the file's meta-data (tree, filename, path, date, etc...) are mutable and subject to corruption in case the Locking Mechanism fails. Having a journaled repository prevents this problem.
To summarise, Journaling allows the repository folder to be recreated at any point in time, in case a corruption occurs.
Data Decay Protection (Coming Soon)
If you have a fancy storage with RAID enabled, you are probably safe from data corruption. However if you are using a more simple storage like a USB hard drive, there is nothing to prevent it. Data decays with time on all storage. Replacing your hard drive every 3 to 4 years is a good start to keep your data healthy.
We created Duple to have a professional private cloud solution, without the hassle and the cost. So we wanted to solve this problem. To do so, we will add 2 protection features:
- Health Check: Duple will do random data integrity checks of the files contained in the repository folder. The idea is to catch data corruption, self-repair, and warn about deterioration before it is too late. That way, the storage can be replaced before the degradation becomes irreversible.
- Parity File: Duple will have PAR2 parity files to self-repair the corrupted data.
If bad blocks are found on your storage, Duple will repair and warn you about it, protecting your data.
Anti Corruption Sync
At every step of the synchronization, Duple double checks for corrupted data. In case of a non caught corruption, it will not be synced to any devices.
In the worst case scenario, if a corrupted file goes through all of our defense mechanisms, it will simply not be synced anywhere.