(Please note this post was originally published in the Spanish version of Security Art Work last 5th Nov 2012)
Some weeks ago I was playing with django, when I accidentally deleted an application that I had already finished. It was not complex; it had few lines of code and I think I would have been able to recover it in less than a day, but I saw in this error the chance to learn how to make a recovery of data on a SSD drive.
The configuration of this computer’s drive is as follows: GPT partitioning with multiple partitions formatted with ext4 (without LVM). My previous experience in this type of situations has always been to use the most known tools in GNU/Linux environments: sleuthkit, autopsy, testdisk y photorec (these last two usually come in the same package), dd, grep…
Returning to my problem, as soon as realized me what he had done I tried to stay calm and follow what is usually my standard procedure: to mount the partition in which the project was as read-only and create an image of the partition with the ‘dd‘ tool, so new interactions with the disk do not overwritte any data.
Created the image, my first choice was testdisk, a forensic analysis and software data recovery that has given me very good results in the past. However, this time it said that the partition was corrupt, so it could not retrieve the contents of the disc. I still don’t know if it was my fault setting the multiple options offered testdisk, or on the contrary this tool does not work well with SSD hard drives or GPT partitions.
After a first failed attempt, I tried with sleuthkit and autopsy. This time, everything went smoothly. After setting the initial parameters, I started “playing” with the options of autopsy:
However, it seemed that the procedure was going to take some time, since the disk image had some size and was stored in a USB external hard drive. As I didn’t want to be looking at the PC without doing anything at 3am on a Friday, I canceled the sleuthkit processes and launched photorec.
With photorec things changed: in just some seconds, it was recovering multiple files, some of them deleted months ago. As it was going to take several hours, but I had already seen that it was doing its job, I decided to leave the process running and continue the next morning. To my surprise, at the next day photorec had found a few thousand files of type text (txt, java extensions (on this computer I have never programmed Java), html, py…).
Given the large number of files, nothing better than some grepping and a few regular expressions to find the directory and the files I had lost. Being python code, I used sentences that would be specific to python, such as import, variable names that reminded, html tags…
After a few hours, despite the fact that it seemed that he had recovered almost all of the project and the fact that I would be able to code whatever files missing, I saw that the number of files that now had the project was more than three times the number of original files. After browsing the files to change their names to the originals (photorec has a drawback: it renames the file with alphanumeric characters, and in some cases it doesn’t correctly detect the file extension), I saw that many files were replicated and that many others were not complete.
I immediately dismissed that those copies were backups: I had not considered it necessary to create a backup for a project like this. What would life be without this small emotions? (N.d.E.: children, do not do this at home)
Looking at the files carefully, I could see that the files corresponded to old versions, as if when I had saved a changed file to disk, it had created a new file and deleted the old, storing each one in different sections of the disk. There were also a few partial files, which were functions of the code but not all that were in the same file.
After some research, I could differentiate between ‘versions’, I performed several tests on the application and saw what lines I had to add, delete or modify to have the application as I had it before. However, this raised me new questions. Was Eclipse storing several versions of the same file? Was it the operating system? Was it the SSD drive?
As many readers will already imagine, the cause of this behavior is the way in which the SSD disk stores data. However, I would like to leave the technical details to a next post, more technical and less theoretical than this one.
See you at next entry!