As written above in the comments, by trying to use a struct and by trying to store each jpg to be written out as once -- you are making things much harder than need be. As the directions discuss the FAT filesystem (which was on the card where the images were taken from), stores chunks of each file in 512 byte sectors. To scan the card, all you need is a 512 byte buffer to handle the read and immediate write to its output file. No structures are needed and there is no need to dynamically allocate memory.
The way to approach the read is to read each 512 block of data from the file. You then need to check if the first 4-bytes of the block hold the jpg header. A short function to test for you could be written as:
#include <stdio.h>
#include <stdlib.h>
#define FNLEN 128 /* if you need a constant, #define one (or more) */
#define BLKSZ 512
/* check if first 4-bytes in buf match jpg header */
int chkjpgheader (const unsigned char *buf)
{
return buf[0] == 0xff &&
buf[1] == 0xd8 &&
buf[2] == 0xff &&
buf[3] >> 4 == 0xe;
}
(you simply test if each condition is true
returning the result of the conditional)
Thinking how to handle scanning for jpg headers and reading the file, you can do it all in a single loop that reads 512 bytes from input, and keeping a counter of the number of jpg headers found -- which you also use as a flag to indicate a header was found. You will read the block of data, test if it is a header, if so, if not the first header, close the output file for the last jpg file written, create a new filename, open the file (validating each step) and then write the data out as you loop checking the start of each 512 byte block for the header signature. Repeat until you run out of file.
You can implement that similar to:
/* find each jpg header and write contents to separate file_000x.jpg files.
* returns the number of jpg files successfully recovered.
*/
int recoverjpgs (FILE *ifp)
{
char jpgname[FNLEN] = ""; /* jpg output filename */
unsigned char buf[BLKSZ]; /* read buffer */
int jpgcnt = 0; /* found jpg header count*/
size_t nbytes; /* no. of bytes read/written */
FILE *fp = NULL; /* FILE* pointer for jpg output */
/* read until jpg header found */
while ((nbytes = fread (buf, 1, BLKSZ, ifp)) > 0) {
/* check if jpg header found */
if (nbytes >= 4 && chkjpgheader(buf)) {
/* if not 1st header, close current file */
if (jpgcnt) {
if (fclose (fp) == EOF) { /* validate every close-after-write */
perror ("recoverjpg()-fclose");
return jpgcnt - 1;
}
}
/* create output filename (e.g. file_0001.jpg) */
sprintf (jpgname, "file_%04d.jpg", jpgcnt + 1);
/* open next file/validate file open for writing */
if ((fp = fopen (jpgname, "wb")) == NULL) {
perror ("fopen-outfile");
return jpgcnt;
}
jpgcnt += 1; /* increment recovered jpg count */
}
/* if header found - write block in buf to output file */
if (jpgcnt && fwrite (buf, 1, nbytes, fp) != nbytes) {
perror ("recoverjpg()-fwrite");
return jpgcnt - 1;
}
}
/* if file opened, close final file */
if (jpgcnt && fclose (fp) == EOF) { /* validate every close-after-write */
perror ("recoverjpg()-fclose");
return jpgcnt - 1;
}
return jpgcnt; /* return number of jpg files recovered */
}
(note: jpgcnt
is used both as a counter and a flag to control when the first fclose()
on a jpg file occurs and to control when the first write to the first file occurs.)
Look at the returns. Understand why jpgcnt
or jpgcnt - 1
is being returned at different places in the function. Also understand why you always check the return of fclose()
after-a-write has taken place. There a number of errors that can occur when the final data is flushed to the file and the file is closed -- which would not be caught by the last checking the last write. So rule -- always validate close-after-write. There is no need for the check when closing your input file.
That's all you need. In main()
you will open the input file and simply pass the open filestream to the recoverjpgs()
function saving the return to know how many jpg files were successfully recovered. It can be as simple as:
int main (int argc, char **argv) {
FILE *fp = NULL; /* input file stream pointer */
int jpgcnt = 0; /* count of jpg files recovered */
if (argc < 2 ) { /* validate 1 argument given for filename */
fprintf (stderr, "error: insufficient input,
"
"usage: %s filename
", argv[0]);
return 1;
}
/* open file/validate file open for reading */
if ((fp = fopen (argv[1], "rb")) == NULL) {
perror ("fopen-argv[1]");
return 1;
}
if ((jpgcnt = recoverjpgs(fp)))
printf ("recovered %d .jpg files.
", jpgcnt);
else
puts ("no jpg files recovered.");
fclose (fp);
}
That is the complete program, just copy/paste the 3-pieces together and give it a try.
Example Use/Output
$ ./bin/recover ~/doc/c/cs50/recover/card.raw
recovered 50 .jpg files.
(the 50 files, file_0001.jpg
to file_0050.jpg
will be created in the current directory -- and you can enjoy the balloons, flowers, girls, etc... shown in the jgp files.)
Look things over and let me know if you have further questions.
Edit Per-Comment Regarding Allocating and Storing Each File to Write Once
Even if you want to buffer each file fully before writing once, the idea of using a struct with a single uint8_t
(byte) and a bool
to flag whether that struct is a header byte doesn't make much sense. Why? It makes a mess out of the write routine. Which would have to check every struct in an allocated block large enough to hold the entire card.raw
file when writing to catch the 4-struct sequence where each struct has its bool
flag set true -- essentially duplicating all testing that was done during the read to find the header bytes and set your bool
struct member true
to begin with.
As mentioned, if there were zillions of files, you would want to scan through the input stream from card.raw
and save the bytes for each jpg in your buffer so that they could be written once to the file while the process continues (you could even fork
the write to a separate process so the read could continue without waiting for the write if you really wanted to tweak things.
Regardless, the approach will be the same. If you dynamically allocate for buf
, you can fill it with each jpg file and when the next header is found -- write the current contents of buf
up to the beginning of the next header to your file, (the move the next header read to the start of buf
) and repeat until you run out of input to check.
You will reuse the allocated storage for buf
throughout the process and only expanding if the current file requires more storage than currently allocated. (so buf
ends up sized to hold the largest jpg found at the end of the day). This minimizes allocations and means the only realloc
s required over all 50 files are the realloc
s needed when a larger file is encountered. If the next 20 files all fit within the currently allocated buffer -- no adjustment is needed and you keep filling buf
over and over again with the different jpg file contents as they are recovered from the "forensic image" (sounds important)
There are only the addition of a bufsz
variable to track the current allocation size of buf
and a total
variable to track the total bytes read in each jpg file. Other than that you are just rearranging where the files are written so that you wait until one complete jpg has been read into buf
before opening and writing those bytes to the file and then closing the file immediately after the file is written (a short function was written to handle that -- since it made sense to write a generic-reusable function to write a given number of bytes from a buffer to a file of a given name.
The complete file could be written as follows.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#define FNLEN 128 /* if you need a constant, #define one (or more) */
#define BLKSZ 512
#define JPGSZ 1<<15 /* 32K initial allocation size */
/* write 'nbytes' from 'buf' to 'fname'. returns number of bytes
* written on success, zero otherwise.
*/
size_t writebuf2file (const char *fname, void *buf, size_t nbytes)
{
FILE *fp = NULL; /* FILE* pointer for jpg output */
/* open file/validate file open for writing */
if ((fp = fopen (fname, "wb")) == NULL) {
perror ("writebuf2file-fopen");
return 0;
}
/* write buffer to file/validate bytes written */
if (fwrite (buf, 1, nbytes, fp) != nbytes) {
perror ("writebuf2file()-fwrite");
return 0;
}
/* close file/validate every close-after-write */
if (fclose (fp) == EOF) {
perror ("writebuf2file-fclose");
return 0;
}
return nbytes;
}
/* check if first 4-bytes in buf match jpg header */
int chkjpgheader (const unsigned char *buf)
{
return buf[0] == 0xff &&
buf[1] == 0xd8 &&
buf[2] == 0xff &&
buf[3] >> 4 == 0xe;
}
/* find each jpg header and write contents to separate file_000x.jpg files.
* returns the number of jpg files successfully recovered.
*/
int recoverjpgs (FILE *ifp)
{
char jpgname[FNLEN] = ""; /* jpg output filename */
int jpgcnt = 0; /* found jpg header count*/
size_t nbytes, /* no. of bytes read/written */
bufsz = JPGSZ, /* tracks current allocation of buf */
total = 0; /* tracks total bytes in jpg file */
uint8_t *buf = malloc (JPGSZ); /* read buffer */
if (!buf) { /* validate every allocation/reallocation */
perror ("malloc-buf");
return 0;
}
/* read until jpg header found */
while ((nbytes = fread (buf + total, 1, BLKSZ, ifp)) > 0) {
/* check if jpg header found */
if (nbytes >= 4 && chkjpgheader(buf + total)) {
/* if not 1st header, write buffer to file, reset for next file */
if (jpgcnt) {
/* create output filename (e.g. file_0001.jpg) */
sprintf (jpgname, "file_%04d.jpg", jpgcnt);
/* write current buf to file */
i