Trial #9: Filter Unique Elements in A Set

less than 1 minute read

Problem

Using Powershell, find and count the unique entries in a set $set containing unsorted, repeated characters.

Solution

A first scan of solutions online provided the following.

$set | sort -unique

This wasn’t very quick on large sets and I was interested in finding a .NET data type specialized for unique values.

I found this in the following type: HashSet(T)

Its easy to convert types in powershell. The syntax is so concise and the result way faster as the data remains unsorted. e.g.

[System.Collections.Generic.HashSet[string]]$unique = $inputA

Here is a little comparison on a relatively small, sorted set of 15512 items. 1/2 a second isn’t so bad but the alternative is 21x faster and that difference will grow with size of set.

$inputA = 1..9999 + 44..4444 + 7777..8888

#Slow
Measure-Command  { ($inputA | sort -Unique).Count}
  TotalMilliseconds : 544.0192

#Fast
Measure-Command  { [System.Collections.Generic.HashSet[string]]$unique = $inputA ; $unique.count}
  TotalMilliseconds : 25.9092

Updated: