Natural String Sorting

Natural String Sorting

Have you ever desired to have in your code a way to order a sequence of strings in the same way as Windows does for files whose name contains a mix of letters and numbers? Natural string sorting is not natively supported in .NET but can be easily implemented by specialising a string comparer and adding a few extensions to the enumerable string collection.

 

The ordinal order

Ordinal numbers are the words representing the rank of a number with respect to some order, in particular order or position (i.e. first, second, third, etc.).

As humans, we learn to put numbers in the correct order as a primary skill, according to their ordinality. We can also order strings containing letters and numbers very easily just by looking at the alphabetical order of letters and the ordinality of a number. For example, the string “Year 1” precedes “Year 2”, which also comes before “Year 10” in this natural order. Basically, we know how to separate the alphabetical part of the sentence from the numerical one, and then we are able to sort the full string by applying two different “sorting algorithms” (alphabetical and ordinal, respectively). Variations of these algorithms may apply in different cultures, but the principle is the same.

Computers, on the converse, use the ASCII table for sorting strings, from the dawn of the informatics era. This table presents a sequence of letters, numbers, punctuation and control characters in a fixed order that tends to mirror the natural order of the alphabet and numbers, and associates every entry in the table (a character) to a number. For example, number 65 corresponds to character “A”, “B” is 66, “C” is 67 and so on. Likewise, also characters 0 to 9 have an ASCII number: “0” is 48, “1” is 49 until “9”, which is 57. Numbers after 9 corresponds to two or more characters (10 is “1” and “0”).

Sorting characters according to the ASCII table, which is what computers do, basically implies sorting each character individually according to their position in the table: same “sorting algorithm” irrespective of the type of the character, whether letter, number, punctuation character or other.

By applying this logic, strings are compared character by character and then sorted according to the numerical position of the character in the ASCII table. The string “Year 1” would still be ordered before “Year 2”, as character “1” appears before “2” in the table. But “Year 10” is before “Year 2” because the “1” character in “10” is compared first with “2” and, again, “1” takes precedence over “2”, irrespective of the following characters.

 

Computers use the ASCII table for sorting strings

 

Default string sorting in .NET

Let’s make an example in C# of displaying a sequence of strings in the standard order supported by the OrderBy() LINQ extension to enumerable collections.

var files = new List<string>

{

   "File 1.txt",

   "File 2.jpg",

   "File 3.doc",

   "File 10.txt",

   "File 11.csv",

   "File 20.xls",

   "File 21.ppt"

};

 

foreach (var file in files.OrderBy(n => n))

{

   Console.WriteLine(file);

}

String ordering with LINQ

 

This simple snippet of C# code orders all strings in the “files” list by using the OrderBy() LINQ extension, and then prints the result on the output console. As it is visible in the following picture, the output of this operation is a list of file names sorted according to the ASCII rule of order of characters.

Output of the ordering program

 

Not the result we would like to have, and definitely not so user-friendly. So, what can we do about it?

 

Natural string sorting

Sticking to the example of the file names, Windows already supports natural sorting of files by name, as in the following screenshot.

Natural file sorting by name in Windows

 

Much better, isn’t it? So, Windows is clever enough to think like a human and understand the alphabetical and numerical part of a string separately. Can we do the same in C#?

The approach to take for introducing natural ordering of a collection of strings in C# starts from the implementation of a string comparer that contains the logic for understanding what is alphabetical and what is numerical within a string. After that, we would want to introduce a few extensions in the same style as LINQ does, for having a shortcut in an enumerable collection of strings to the natural sorting methods.

 

The NaturalStringComparer class

We have seen in the previous example that the easiest way to sort a collection of strings is to use the LINQ extension OrderBy() on an enumerable object. The OrderBy() method has an overload that accepts an IComparer<TKey> for specifying a bespoke way of ordering objects of type TKey.

The signature of this overload is:

public static IOrderedEnumerable<TSource> OrderBy<TSource, TKey>(

   this IEnumerable<TSource> source,

   Func<TSource, TKey> keySelector,

   IComparer<TKey> comparer);

 

The key to perform custom comparisons of strings is to implement the IComparer interface, then. This interface exposes only one method, Compare, with the following signature:

int Compare(T x, T y);

 

The implementation of Compare in our custom comparer class should perform a comparison of the two objects and return a value indicating whether one is less than, equal to, or greater than the other. A value less than zero is returned if x is less than y; zero is returned is x equals y; a value greater than zero is returned if x is greater than y. Comparing null with any reference type is allowed and does not generate an exception. A null reference is considered to be less than any reference that is not null.

public class NaturalStringComparer : IComparer<string>

{

   private bool _ignoreCase = true;

 

   public NaturalStringComparer() : this(true)

   {

   }

 

   public NaturalStringComparer(bool ignoreCase)

   {

      _ignoreCase = ignoreCase;

   }

 

   public int Compare(string x, string y)

   {

      // full implementation on the attached solution

   }

}

Implementation of a natural string comparer in C#

 

The NaturalStringComparer implements the IComparer<string> interface, and specifically its method Compare(), and also presents an overloaded constructor for discerning between a case-sensitive or insensitive comparison. Clever! J

 

Extensions

We can already use our NaturalStringComparer when ordering a collection of strings via the OrderBy() extension; with reference to the list of file names presented at the beginning of this article, we can simply use the overload of the OrderBy() method that accepts an IComparer in input for obtaining the desired result:

foreach (var file in files.OrderBy(n => n, new NaturalStringComparer()))

{

   Console.WriteLine(file);

}

 

This generates the following result, which is exactly what we want:

Output of the natural ordering program

 

If we were to order the file names in descending order, we would use the OrderByDescending() extension with the explicit call to NaturalStringComparer:

foreach (var file in files.OrderByDescending(n => n, new NaturalStringComparer()))

{

   Console.WriteLine(file);

}

 

What about a more concise way of ordering a collection of strings by adding two more extensions?

·         OrderByNatural

·         OrderByNaturalDescending

The signature for these extensions, and their overload for allowing case-sensitive sorting, is the following; remember that extension methods must be declared as public static in a static class:

public static class StringExtensions

{

   public static IOrderedEnumerable<string> OrderByNatural(

      this IEnumerable<string> strings,

      Func<string, string> keySelector);

 

   public static IOrderedEnumerable<string> OrderByNatural(

      this IEnumerable<string> strings,

      Func<string, string> keySelector,

      bool ignoreCase);

 

   public static IOrderedEnumerable<string> OrderByNaturalDescending(

      this IEnumerable<string> strings,

      Func<string, string> keySelector);

 

   public static IOrderedEnumerable<string> OrderByNaturalDescending(

      this IEnumerable<string> strings,

      Func<string, string> keySelector,

      bool ignoreCase);

}

LINQ-style extensions for ordering strings like human beings do

 

Internally, each method invokes the OrderBy() and OrderByDescending() extensions accordingly, with an explicit call to the NaturalStringComparer class, using the respective overload for ignoring case-sensitive comparison. So, in our file names example, sorting a list of strings by ordinal order would be as simple as:

foreach (var file in files.OrderByNatural(n => n))

{

   Console.WriteLine(file);

}

 

Clever 2.0! J

 


  Comments

 

 Source Code

 Related Content
A flexible Default Value for your DateTime properties
When creating an MVC application with Entity Framework, it is possible to set default values for most properties in a model using the DefaultValue attribute. However, no much flexibility is offered for a DateTime property. This article presents a custom validation attribute for DateTime types that accepts different formats for defining the default value of the property.
Adding a Secured Geo-located Audit Trail
How I built a social sharing component for my own web site and added a secured geo-located audit trail. Step by step, let’s analyse technologies and source code for developing this component.
Adding Social Sharing to a Web Site
How I built a social sharing component for my own web site and added a secured geo-located audit trail. Step by step, let’s analyse technologies and source code for developing this component.
Best practices for mobile form design in SharePoint
Build effective SharePoint forms with Nintex that are accessible anywhere, at any time, and on any device. You built the workflows, you built the forms, now make them mobile.
Bring your “A” game to ESPC16
With just over 3 weeks to go to Europe's largest gathering of SharePoint & Office 365 professionals, take a look at these tips that will help you get the most out of ESPC16…
Building an MVC application for SharePoint
Learn how to write code to perform basic operations with the SharePoint 2013 .NET Framework client-side object model (CSOM), and build an ASP.NET MVC application that retrieves information from a SharePoint server.
CIO vs CTO
What are the synergies and differences of the roles of a Chief Information Officer and a Chief Technology Officer? An open conversation about two roles with one mission…
Coded UI test automation of MVC applications with Visual Studio
Whether you are a software developer, tester, administrator or analyst, this article can help you master different types of UI testing of an MVC application, by using Visual Studio for creating coded UI test suites that can be automated for continuous execution.
Converting GIS spatial coordinates
Different formats and standards exist for describing geographical coordinates in GIS systems and applications. This article explains how to convert between the most used formats, presenting a working library written in C#.
Creating mobile accessible forms in SharePoint
With the release of the Nintex Mobile apps, SharePoint users can now optimise their experience across popular mobile devices and platforms.
Define your Performance Testing strategy with Visual Studio
Performance Testing is an essential part of software testing, with the specific goal of determining how a system performs in terms of responsiveness and stability under a particular workload. In this series of posts we’ll define and execute a good strategy for testing performance of an application using Visual Studio.
Disserting about colliding GUIDs and the Big Bang Theory
Can you generate two identical GUIDs? Would the world end if two GUIDs collide? How long does it take to generate a duplicate GUID and would we be still here when the result is found?
GIS Location Services in Dynamics CRM
A design paper about implementing GIS-based services for a local Council in Dynamics CRM, structuring address data, and delivering location services in the form of WebAPI endpoints via an Enterprise Service Bus.
Group or Team?
All teams are groups but not all groups are teams. What defines a group and what a team? When do we need one over the other one?
How to Give Constructive Feedback
Learning to give and receive constructive feedback is an essential part of learning, growing, improving and achieving our goals.
Mirroring an iPad on your laptop
Have you ever wanted to see your iPhone or iPad on a larger screen? Play games, watch movies, demo apps or present to your computer from your iPhone or iPad. Reflector mirrors iOS devices on big screens wirelessly using iOS built-in AirPlay mirroring.
Mobilize your SharePoint workflows
Build workflow applications in SharePoint that can be accessed on mobile devices using the Nintex solution for business process mobilization.
Sales Effectiveness in Dynamics CRM with Azure IoT and Machine Learning - Part 1
How can an organisation optimise its sales channels and product targeting by building a 365-degree view of its customers in Dynamics CRM? The answer, and topic of this article, is with the help of Azure IoT and Machine Learning services!
Scaling Applications with Azure Redis and Machine Learning - Part 1
This article presents design best practices and code examples for implementing the Azure Redis Cache and tuning the performance of ASP.NET MVC applications, optimising cache hit ratio and reducing “miss rate” with smart algorithms processed by Machine Learning.
Software Development Management in 11 steps
What it takes to be a great Software Development Manager? I have been building software for the last 15 years and have collected a few stories and experiences to share. Some I have used as questions when interviewing candidates. In 11 points, this is my story to date.
SOLID SharePoint apps with MVC
Practical code examples of ASP.NET MVC applications that connect to a SharePoint Server and comply with the SOLID principles.
The art of outsourcing projects
Outsourcing may look financially attractive, but working with companies in far-off lands introduces challenges that, if not considered properly, can drive any project to failure. Let’s explore some common pitfalls when working with offshore partners and a common-sense approach to work around them.
The value of hashtags
Customers expect a modern approach to advertising. Digital advertising can leverage evolving technology to provide just-in-time, just-at-the-right-place promotions.
We don't need no Yoda's syntax
There is an urban myth in the programmers’ community that the so called “Yoda’s syntax” performs better when checking an object for nullity. Let's demystify it...