Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
132 views
in Technique[技术] by (71.8m points)

c - Atomicity of `write(2)` to a local filesystem

Apparently POSIX states that

Either a file descriptor or a stream is called a "handle" on the open file description to which it refers; an open file description may have several handles. […] All activity by the application affecting the file offset on the first handle shall be suspended until it again becomes the active file handle. […] The handles need not be in the same process for these rules to apply. -- POSIX.1-2008

and

If two threads each call [the write() function], each call shall either see all of the specified effects of the other call, or none of them. -- POSIX.1-2008

My understanding of this is that when the first process issues a write(handle, data1, size1) and the second process issues write(handle, data2, size2), the writes can occur in any order but the data1 and data2 must be both pristine and contiguous.

But running the following code gives me unexpected results.

#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/wait.h>
die(char *s)
{
  perror(s);
  abort();
}

main()
{
  unsigned char buffer[3];
  char *filename = "/tmp/atomic-write.log";
  int fd, i, j;
  pid_t pid;
  unlink(filename);
  /* XXX Adding O_APPEND to the flags cures it. Why? */
  fd = open(filename, O_CREAT|O_WRONLY/*|O_APPEND*/, 0644);
  if (fd < 0)
    die("open failed");
  for (i = 0; i < 10; i++) {
    pid = fork();
    if (pid < 0)
      die("fork failed");
    else if (! pid) {
      j = 3 + i % (sizeof(buffer) - 2);
      memset(buffer, i % 26 + 'A', sizeof(buffer));
      buffer[0] = '-';
      buffer[j - 1] = '
';
      for (i = 0; i < 1000; i++)
        if (write(fd, buffer, j) != j)
          die("write failed");
      exit(0);
    }
  }
  while (wait(NULL) != -1)
    /* NOOP */;
  exit(0);
}

I tried running this on Linux and Mac OS X 10.7.4 and using grep -a '^[^-]|^..*-' /tmp/atomic-write.log shows that some writes are not contiguous or overlap (Linux) or plain corrupted (Mac OS X).

Adding the flag O_APPEND in the open(2) call fixes this problem. Nice, but I do not understand why. POSIX says

O_APPEND If set, the file offset shall be set to the end of the file prior to each write.

but this is not the problem here. My sample program never does lseek(2) but share the same file description and thus same file offset.

I have already read similar questions on Stackoverflow but they still do not fully answer my question.

Atomic write on file from two process does not specifically address the case where the processes share the same file description (as opposed to the same file).

How does one programmatically determine if “write” system call is atomic on a particular file? says that

The write call as defined in POSIX has no atomicity guarantee at all.

But as cited above it does have some. And what’s more, O_APPEND seems to trigger this atomicity guarantee although it seems to me that this guarantee should be present even without O_APPEND.

Can you explain further this behaviour??

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

man 2 write on my system sums it up nicely:

Note that not all file systems are POSIX conforming.

Here is a quote from a recent discussion on the ext4 mailing list:

Currently concurrent reads/writes are atomic only wrt individual pages, however are not on the system call. This may cause read() to return data mixed from several different writes, which I do not think it is good approach. We might argue that application doing this is broken, but actually this is something we can easily do on filesystem level without significant performance issues, so we can be consistent. Also POSIX mentions this as well and XFS filesystem already has this feature.

This is a clear indication that ext4 -- to name just one modern filesystem -- doesn't conform to POSIX.1-2008 in this respect.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...