performance - Fast conversion of numeric data into fixed width format file in Python -
The fastest way to change the data with the format strings is to change the data with only the data and type in the file in Python What is it? For example, suppose that record
is a large list containing the objects id
, x
, y
, and Wt
and we need to flush them into external files, flushing can be done with the following snippet:
open (serial_fname (), "w") With f: for the record: f.write ("% 07d% 11.5e% 11.5e% 7.5f \ n"% (r.id, rx, ry, r.wt))
< P> Although my code is spending a lot of time, very little time to create external files Skip
Modify the basic question:
While writing a server software I went into this problem by dragging information from several "Manufacturer" systems Keep track of the prescribed global record and relay any changes to the "consumer" system in the record, in real-time or pre-processing form, near real-time. Many of the consumer systems are catalog applications. I
I have listed some suggestions with some suggestions (thank you) below:
- Only changes Dump to, not the entire data set: The resultant change set I'm actually doing it already is still huge.
- Use the binary (or some more efficient) file format: I am quite convinced how matlab can read quite efficiently and apart from this format the platform is free. should be there .
- Use the database: I'm actually trying to bypass the current database solution which is both very slow and cumbersome, especially towards Matlab.
- Separating separate processes: The dumping code is currently running in its own thread. However, due to GIL, it is still consuming the same core, I think I can move it completely in a separate process.
I was trying to check things a bit slow, so I The following simulations were written: Import
import system np fmt = '% 7.0f% 11.5e% 11.5e% 7.5f' record = 10000 NP. Random. Seed (1234) aray = np.random.rand (record, 4) def hit (F, AERU = array, FMT = FMT): FW = F. Write for the row in RE: FW (FMT% Tupal (Row)) for R in the line: Print & gt; F, fmt% tupal (line) def stxt (f, aray = aray, fmt = fmt): np.savetxt (fmt): def prin (f, aray = aray, fmt = fmt) f, array, fmt) Open ('/ Davis / Null', 'W') DF Tanule (Funk, Nool = Neul): Funk (Null) DIF Main (): Print 'Looping:' Loop (sys.stdout, array) Print 'sakette:' Suvext (sys.stdout, aray)
I got the result (on my 2.4GHz Core Duo MacBook Pro, Mac OS X 10.5.8, 2.5.4 with Python, Python.org From DMG, the numpy 1.4 rc1 created from sources is a bit surprising, l Qin they are quite backpack, so I thought they might be interested:
$ py25 -mtimeit -s'import ft "ft.tonul (ft.writ) '10 loops, Best 3: 101 msec per loop $ py25 -mtimeit -s'import ft '' ft.tonul (ft.prin) '10 loops, best 3: 98.3 msec per loop $ py25 -mtimeit -s'import ft '' ft So, savetxt a few percent slower a loop calling .10lops, best 3: 104 loop per MSEC
Feels comparing ... but looks good old print
(also in a loop) <<> Code < (I think it is avoiding any kind of call overhead). I realize that a difference of 2.5% or something is not very important, but this direction that I I was not in the hope of being comfortable, so I thought I would report it. (BTW connects equally 6 or 7 milliseconds using an actual file instead of / dev / null
, so it does not change a lot, one way or another).
Comments
Post a Comment