performance - What are the most efficient idioms for streaming data from disk with constant space usage? -


problem description

I need to stream large files from disk. I assume that file memory Being fit in is larger. Also, suppose I am doing some calculation on the data and the result is small enough to fit in memory. As a hypothetical example, suppose I need to compute an MD5 sequence of 200 GB files and guarantee me how much RAM will be used.

In summary:

    <
  • need constant space
  • as much as possible
  • Assume too large files
  • Result in memory

question

What are the fastest ways to read / read data from a file using a static location?

I had ideas

if the file was small enough to fit in memory, then mmap would be very fast on the POSIX system, unfortunately This is not the case here is there a performance gain for using mmap with a small buffer size to buffer the buffer of the file? Should the overhead be overwhelmed by any benefits to go down the system MMAP buffer, or should I use a fixed buffer that I have read with fread ?

I would not believe that mmap will be very fast (where very fast Defined with is much faster than fread .

Grep is used to use mmap , but fread one of the reasons was stability (strange things happen with MMAP, if the file is magged or an IO error occurs). There is some discussion about this history. >

you The grep can compare performance on your system with the - mmap option. The difference in performance on 200GB of file on my system is negligible, but your mileage may vary!

In short, I use fread with a fixed size buffer. It's easy to code, errors are easy to handle and almost certainly fast enough.


Comments

Popular posts from this blog

c# - How to capture HTTP packet with SharpPcap -

php - Multiple Select with Explode: only returns the word "Array" -

php - jQuery AJAX Post not working -