performance - What are the most efficient idioms for streaming data from disk with constant space usage? -
problem description
I need to stream large files from disk. I assume that file memory Being fit in is larger. Also, suppose I am doing some calculation on the data and the result is small enough to fit in memory. As a hypothetical example, suppose I need to compute an MD5 sequence of 200 GB files and guarantee me how much RAM will be used.
In summary:
- <
- need constant space
- as much as possible
- Assume too large files
- Result in memory
question
What are the fastest ways to read / read data from a file using a static location?
I had ideas
if the file was small enough to fit in memory, then mmap
would be very fast on the POSIX system, unfortunately This is not the case here is there a performance gain for using mmap
with a small buffer size to buffer the buffer of the file? Should the overhead be overwhelmed by any benefits to go down the system MMAP
buffer, or should I use a fixed buffer that I have read with fread
?
I would not believe that mmap
will be very fast (where very fast Defined with is much faster than fread
.
Grep is used to use mmap
, but fread
one of the reasons was stability (strange things happen with MMAP, if the file is magged or an IO error occurs). There is some discussion about this history. >
you The grep can compare performance on your system with the
In short, I use fread
with a fixed size buffer. It's easy to code, errors are easy to handle and almost certainly fast enough.
Comments
Post a Comment