Quality Issue #1 – Count vs. Count()
A measure of how good you are as a developer is how well you can write code. This is a start of a series of posts to help developers write better code. Today we will look at Count vs. Count() in .NET.
Count
There are many collections in .NET that support a property called Count. This includes List, List<T>, HashSet<T>, and many more. The Count property is a value the represents the number of elements in the collection.
Here is the documentation for the List<T>.Count property.
List<T>.Count Property (System.Collections.Generic) | Microsoft Docs
Count()
.NET supports two interfaces, IEnumerable and IEnumerable<T>, which provide the ability to iterate over a collection. Along with these interfaces are a set of static extension methods defined in Enumerable which add functionality for querying a collection of objects based on LINQ. One of those methods is Count() which iterates over the collection to determine the number of items in the collection.
Here is the documentation for the IEnumerable interface and Enumerable.Count method.
IEnumerable Interface (System.Collections) | Microsoft Docs
IEnumerable<T> Interface (System.Collections.Generic) | Microsoft Docs
Enumerable.Count Method (System.Linq) | Microsoft Docs
Comparison
So why do we care about the difference between Count and Count()? One simply reads a value in memory to determine the count of the elements in a collection and the other iterates over the entire collection in memory to determine the count of the number of items.
There is a big performance difference when it comes to these two approaches. What is worse is that the performance difference gets worse when the size of the collection grows. These might now seem like a big deal, but it does add up over time. If you have a large code base or high scale application, you will begin to see the impact over time.
Here is an example of code that gets a collection of people using List<Person>. We then use Count and Count() to get the number of people in the collection.
We measure the performance difference between the two approaches. One can see that Count() extension method takes 7 times longer than using the Count property. This gets worse when the number of items in the collection increases.
If you don’t believe that this is a quality issue, check out the code analysis tip in Visual Studio by hovering over the Count() method. There you will see code analysis rule CA1829.
The description of rule CA1829 provides a clear reason as to why not to use the Count() method.
“The Count LINQ method was used on a type that supports an equivalent, more efficient Length or Count property.”
Here is the documentation for the Code Analysis Performance Rule CA1829.
Anecdote
I remember being one of a four architects on Fidelity’s Active Trader Pro. This is amazing product put together by about 80 to 90 awesome developers. Count vs. Count() was one of the performance problems we would find in our code reviews. An even bigger challenge was the overuse of LINQ queries in a fluent style programming syntax. Finding bad LINQ queries was a large part of our performance optimization during the project. This leads me to one of my favorite things to tell developers, “LINQ is convenient not performant”. Interestingly, I am on a project at the moment where we are addressing quality issues such as Count vs. Count().
Conclusion
It is out hope that you have learned the proper the use of Count vs. Count() and that this is the beginning of your journey to improve the quality of the code that you write.
Appreciation
Thanks to our friends at MILL5 for sponsoring this article.
Author(s):
Richard Crane, Founder/CTO
Disclaimer:
All source code is licensed under the Apache 2.0 license.
References:
Count vs. Count() Code Example
https://github.com/MILL5/quality/tree/main/fundamentals/Count
Introducing FastSearch, a very fast string search for objects and lookups
Welcome to our first article regarding fast in-memory search of a list of objects using FastSearch, a .NET class library for fast string-based search brought to you by the team at MILL5.
The motivation behind this library is simple, we want very fast search of a large list of objects based on strings so that we can display the results to users. Of course, there are many ways search can be done in .NET by writing very little code. One way is to loop through a list of objects using the String class to search string properties of your objects. Unfortunately, this type of search is not very fast and gets worse the larger the list gets.
A variation on using the String class is to use LINQ. It is extremely easy to write a small amount of LINQ code which queries a list of objects. The amazing part about this approach is how simple this is and how much .NET developers rely on LINQ queries every day. Unfortunately, there is a lot of LINQ code that is the source of performance problems everywhere.
String Contains and LINQ queries are not optimized and require some help to get better search performance. We will see the different optimizations we do per algorithm and compare their performance. Here is the list of algorithms we will be comparing.
Algorithm |
Description |
String Contains |
Search a list using String.Contains method |
LINQ |
Search a list using a LINQ query |
Hash |
Search a list by precomputing hashes for all possible search patterns |
Character Sequence |
Search a list using a precomputed index structure which maintains a character sequence tree of all possible search pattens |
Each of these algorithms will have different performance characteristics for indexing and searching. Let us look at the index performance for each algorithm.
Notice that the Hash and Character Sequence algorithms takes significantly longer to index than the String Contains and LINQ algorithms. That is because very little is done to build an index or precompute values for searching. Fortunately for the Hash and Character Sequence algorithms we assume that building indexes happen once or so infrequently that our users will not be impacted by the overhead of the indexing process. Of course, when we do have to update our indexes, we do so in the background so that the performance impact is not observed by our users. Once the indexes are rebuilt, we swap out the old indexes for the new indexes.
Let us turn our attention to search performance. Notice that the String Contains and LINQ algorithms take a significantly long time to search. That is because we have not done much to improve the performance on these algorithms. Instead, we put the performance optimizations into the Hash and Character Sequence algorithms.
Notice that searching using the Hash and Character Sequence algorithms is very fast compared to String Contains and LINQ. Most of the optimization is due to the data structure used for the index. We do perform other optimizations like precomputing case insensitive strings, binary searching, and parallelism, but these optimizations pale in comparison to the performance gains by using efficient data structures.
The Hash algorithm uses a map based on precompute hash of all possible search patterns within the list of objects. When a search is performed, the hash of the search pattern is computed and used to find all matching objects within the map.
The Character Sequence algorithm builds a tree of all possible character sequences. When a search is performed, the search pattern is broken into its character sequence. The search is performed by walking the tree to find all matching objects that contain that sequence.
The Hash and Character Sequence algorithms offer significant performance improvements over the String Contains and LINQ algorithms. The clear winner though is the Character Sequence algorithm. This is due to having the fastest search performance and better index performance that the Hash algorithm. The Character Sequence algorithm is also a very compact data structure and is very efficient and uses very little memory.
Take note that the search performance for the Hash and Character Sequence is 1235 times more scalable and with a significant increase in performance (i.e., decrease in time spent). That frees up resources for your application which it can use to perform other operations.
In a future article, we will go into the data structures used for each of these algorithms. In addition, we will be improving this library over time to offer better and faster searching. If you want to use the FastSearch library, add it to your .NET project from NuGet at https://github.com/MILL5/FastSearch.
Enjoy using FastSearch for your own needs.
Author(s):
Richard Crane, Founder/CTO
Special Thanks:
Steve Tarmey, Principal Architect
James Pansarasa, Chief Architect
Disclaimer:
FastSearch is licensed under the Apache 2.0 license.
References:
Fast Search – NuGet
https://www.nuget.org/packages/FastSearch/
FastSearch – GitHub
https://github.com/MILL5/FastSearch
Rabin–Karp algorithm
https://en.wikipedia.org/wiki/Rabin%E2%80%93Karp_algorithm
Boyer–Moore–Horspool algorithm
https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore%E2%80%93Horspool_algorithm