web crawler - Writing a Faster Python Spider -


I'm writing a spider in Python to crawl a site. Trouble is, I need to check 2.5 million pages, so I can actually use some help to optimize for speed.

Whatever I have to do, checks the pages for a certain number, and if this is the spider found to enter the link on the page is very simple, sort it through many pages Is required.

I'm completely new to Python, but using Java and C ++ first I have not started coding it yet, so any recommendations to include libraries or frameworks are very much Good adaptation tips are also very appreciated.

You can use it as Google, either through (especially with Python: and ), Or.

The traditional line of thought, writes its program in standard Python, if you think that it is very slow, and optimizes specific slow spots.

If you are only spreading a site, consider using wget -r ().


Comments

Popular posts from this blog

c# - How to capture HTTP packet with SharpPcap -

jquery - SimpleModal Confirm fails to submit form -

php - Multiple Select with Explode: only returns the word "Array" -