c# - .NET: How to efficiently check for uniqueness in a List<string> of 50,000 items? -


In some library codes, I have a list that can contain 50,000 items or more.

The callers of the library can call the methods, which resulted in being added to the wire list. How do I check efficiently for string specificity?

Currently, before adding a string, I am scanning the entire list and comparing each string in the in-linked string. It displays the scale problems above 10,000 objects.

I benchmark it, but I am interested in the insights.

  • If I replace the list with a list & lt;> then it will include () increases 10,000 items and beyond?
  • If I postpone the Specification Check after adding all the items, will it be faster? At that point I will need to examine every element against every element, even then an N ^^ 2 operation.
  • Edit

    A few basic benchmark results I have created an abstract class that exposes 2 ways: Fill and Fill the scan with just n items (I used 50,000). M Scans in the scan list (I use 5000) to see if a given value exists or not. Then I created the implementation of that class for the list, and for the second hashaseet.

    The used wire was evenly 11 characters in length, and randomly generated through a method in the abstract class.

    A very basic micro-benchmark

      Hello from cheeses. The liststones are filling 50000 items ... 5000 scanning items ... filling time: 00: 00: 00.4428266 Scanning time: 00: 00: 13.0291180 Chiso hello 5000 scanning objects ... to fill time: 00: 00: 00.3797751 Scanning time: 00: 00: 00.4364431  

    Therefore, that length for the stars is about 25x faster than the hashet's list, when scanning for specificity, besides the collection For this size, while adding items to the collection, there is a zero penalty on the list of hashetsets.

    The results are not interesting and valid, to get legitimate results, with random selection of implementation, I have to do warm-up interval, several tests. But I believe it will take only a few times.

    Thanks everyone

    EDIT2

    After adding randomization and multifunctional tests, Hasht constantly outlays the list in this case , Up to approximately 20x

    These results do not necessarily have to be variable lengths, more complex objects, or strings of different collection sizes.

You should use a class, which is specially designed for what you are doing.


Comments

Popular posts from this blog

c# - How to capture HTTP packet with SharpPcap -

jquery - SimpleModal Confirm fails to submit form -

php - Multiple Select with Explode: only returns the word "Array" -