About ValueUtils
ValueUtils implements Equals
and GetHashCode
for you. By using runtime code-generation, the performance overhead is kept small; ValueObject<>
generally outperforms alternatives such as Tuple<>
, struct
or anonymous types (see benchmarks below).
The library is available on nuget (for import or direct download) as ValueUtils. Though it's implemented in C#, it's just as applicable to VB.NET classes.
Contributions welcome! If you've found a bug, are missing a feature, or just have a question, please do create a new github issue, pull request, or send an email to 'eamon (at) nerbonne (dot) org'.
Usage:
The easy way
The easiest way to use value semantics is to derive from ValueObject<>
, for example:
using ValueUtils;
sealed class MyValueObject : ValueObject<MyValueObject> {
public int A, B, C;
public string X,Y, Z;
// ...
}
A class deriving from ValueObject<T>
implements IEquatable<T>
and has Equals(object)
, Equals(T)
, GetHashCode()
and the ==
and !=
operators implemented in terms of their fields.
Explicit usage
You can also generate delegates (or an IEqualityComparer<>) for hashing and equality comparison for any type (also types in other assemblies you don't control). Given the following example class:
class ExampleClass {
public string myMember;
protected readonly DateTime supports_readonly_too;
int private_int;
// ...
}
The generated hash function can be explicitly used as follows:
using ValueUtils;
Func<ExampleClass, int> hashfunc = FieldwiseHasher<ExampleClass>.Instance;
//or call immediately with type-inference
int hashcode = FieldwiseHasher.Hash(my_example_object);
The generated equality function can be explicitly used as follows:
using ValueUtils;
Func<ExampleClass, ExampleClass, bool> equalityComparer = FieldwiseEquality<ExampleClass>.Instance;
//or call immediately with type-inference
bool areEqual = FieldwiseEquality.AreEqual(my_example_object, another_example_object);
Usage in struct
s
The above delegates are considerably faster than the built-in ValueType
-provided defaults for struct
s (which use reflection every call), which is why they're a good fit to help implement GetHashCode
and Equals
for your own structs. Unfortunately, you can't use inheritance to mix in the generated code, so you'll need to use the explicit calls described above. For example:
struct ExampleStruct : IEquatable<ExampleStruct> {
int some, members, here;
//...
public bool Equals(ExampleStruct other) {
return FieldwiseEquality.AreEqual(this, other);
}
public override bool Equals(object obj) {
return obj is ExampleStruct && Equals((ExampleStruct)obj);
}
public override int GetHashCode() {
return FieldwiseHasher.Hash(this);
}
}
Limitations and gotcha's
Cyclical data structures: ValueObject<>
supports self-referential types (like tree structures or a singly linked list), but does not support cyclical types - such as a doubly linked list. Whenever a cycle is encountered, the hash function and equals operations will not terminate (until the stack overflows).
Inheritance: Equality is implemented on a per-type basis, and that means inheritance gets confusing. It's OK to have a base class (and base class fields will affect hash and equality), but if you use the base-class's equality and/or hash implementation on a subclass instance the code will seem to work but only consider the fields of the base class. Best practice: don't create sub-classes that add new fields; and if you do then at least never use the base-class equality+hashcode implementations. This is why ValueObject verifies that its subclasses must be sealed.
Lazily constructed internals: FieldwiseHasher
and FieldwiseEquality
"work" on almost all types, including types with private members in other assemblies - however, if you don't know the internals, you can't be sure what's being included in the equality computations. In particular, if an object is lazily initialized, two semantically equivalent objects might compute as unequal simply because one is initialized and the other is not. In practice this is rarely a problem.
Performance and hash-quality
TL;DR ValueObject<>
usually outperforms alternatives such as Tuple<>
, struct
and anonymous types. Compared to hand-rolled implementations common operations such as .ToDictionary
are around 15-25% slower (if your object contains "expensive" data such as large strings, the difference will become a lot smaller).
All performance measurements were done on an i7-4770k at a fixed clock rate of 4.0GHz. Timings are in nanoseconds per object. Datasets are all approximately 3000000 objects in size. Loops over the dataset were repeated until 10 seconds were up, then the fastest quartile average reported (this minimizes interference by other processes on my dev machine since random interference is almost always bad for performance, not good). Some hash generators (notably struct
) are so poor that this wasn't feasible, those timings are omitted (NaN) below.
Note that even a perfect hash mix is expected to have 0.03-0.04% colliding buckets, so if you see numbers like that in the data below, a hash if functioning as expected. Numbers better(lower) than that are actually worrisome, because that means some kind of structure in the input is being exploited, and that likely means similar but slightly different data exists that will have lots of collisions. And of course, number much higher that that directly impact performance.
Quite a few tests use a simple pair of ints - this is relevant because this is pretty much a worst case for ValueObject. Although the generated code is fast, calling into that code requires a cast and a Delegate call, and those are (relatively) expensive operations in .NET - at least, compared to simple integer math that a pair-of-ints hashcode requires. With more complicated objects containing reference types the cost of the hashcode computation will start to matter more, and the overhead less.
Realistic scenario with an enum, a string, a DateTime, an int? and 3 int fields. | ||||||
---|---|---|---|---|---|---|
Name | Collisions | Distinct Hashcodes | .ToDictionary() | .Distinct().Count() | .Equals() | .GetHashCode() |
ComplicatedManual | 0.04% | 2912961 / 2914000 | 218.8 | 199.6 | 6.9 | 17.4 |
ComplicatedValueObject | 0.04% | 2912977 / 2914000 | 250.2 | 230.5 | 21.4 | 42.1 |
Tuple | 0.03% | 2913001 / 2914000 | 482.2 | 494.8 | 257.6 | 263.5 |
ComplicatedStruct | 100.00% | 2 / 2914000 | NaN | NaN | 1002.3 | 97.2 |
Anonymous Type | 0.03% | 2913022 / 2914000 | 261.0 | 247.8 | 31.5 | 52.9 |
A simple pair of ints | ||||||
---|---|---|---|---|---|---|
Name | Collisions | Distinct Hashcodes | .ToDictionary() | .Distinct().Count() | .Equals() | .GetHashCode() |
IntPairManual | 0.02% | 2975318 / 2976000 | 159.3 | 133.6 | 3.8 | 1.7 |
IntPairValueObject | 0.03% | 2974963 / 2976000 | 199.3 | 181.9 | 20.0 | 16.9 |
Tuple | 38.31% | 1835788 / 2976000 | 353.7 | 289.2 | 98.4 | 54.9 |
IntPairStruct | 56.61% | 1291168 / 2976000 | 864.7 | 812.6 | 31.4 | 36.8 |
Anonymous Type | 4.69% | 2836344 / 2976000 | 185.2 | 158.2 | 15.3 | 13.5 |
Two ints with both the same value | ||||||
---|---|---|---|---|---|---|
Name | Collisions | Distinct Hashcodes | .ToDictionary() | .Distinct().Count() | .Equals() | .GetHashCode() |
IntPairManual | 0.37% | 2988915 / 3000000 | 188.5 | 155.6 | 3.6 | 1.7 |
IntPairValueObject | 0.03% | 2999012 / 3000000 | 200.8 | 194.6 | 19.7 | 16.5 |
Tuple | 22.07% | 2337827 / 3000000 | 145.2 | 140.5 | 76.4 | 55.1 |
IntPairStruct | 100.00% | 1 / 3000000 | NaN | NaN | 31.0 | 36.7 |
Anonymous Type | 0.00% | 3000000 / 3000000 | 144.5 | 106.3 | 12.1 | 13.5 |
Two ints such that (x,y) is present iif (y,x) is present in the dataset | ||||||
---|---|---|---|---|---|---|
Name | Collisions | Distinct Hashcodes | .ToDictionary() | .Distinct().Count() | .Equals() | .GetHashCode() |
IntPairManual | 0.62% | 3014881 / 3033584 | 154.3 | 140.5 | 3.6 | 1.7 |
IntPairValueObject | 0.03% | 3032561 / 3033584 | 192.8 | 174.6 | 19.6 | 17.0 |
Tuple | 41.47% | 1775545 / 3033584 | 457.5 | 432.1 | 76.0 | 54.8 |
IntPairStruct | 74.50% | 773500 / 3033584 | 804.6 | 775.8 | 31.0 | 36.6 |
Anonymous Type | 0.79% | 3009536 / 3033584 | 175.2 | 161.2 | 12.1 | 13.5 |
A reference to the type itself and two int fields. The dataset contains exactly one level of nesting such that the outer object is (x,y) when the inner is (y,x). | ||||||
---|---|---|---|---|---|---|
Name | Collisions | Distinct Hashcodes | .ToDictionary() | .Distinct().Count() | .Equals() | .GetHashCode() |
NastyNestedManual | 24.14% | 2267216 / 2988648 | 225.1 | 181.5 | 6.3 | 5.0 |
NastyNestedValueObject | 0.03% | 2987634 / 2988648 | 239.7 | 208.8 | 30.9 | 33.0 |
Tuple | 57.80% | 1261193 / 2988648 | 489.5 | 491.8 | 103.3 | 132.0 |