Schwartzian transform
Encyclopedia
In computer science
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...

, the Schwartzian transform is a Perl
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...

 programming idiom
Programming idiom
A programming idiom is a means of expressing a recurring construct in one or more programming languages. Generally speaking, a programming idiom is an expression of a simple task or algorithm that is not a built-in feature in the programming language being used, or, conversely, the use of an...

 used to improve the efficiency of sorting
Sorting
Sorting is any process of arranging items in some sequence and/or in different sets, and accordingly, it has two common, yet distinct meanings:# ordering: arranging items of the same kind, class, nature, etc...

 a list of items. This idiom is appropriate for comparison-based sorting when the ordering is actually based on the ordering of a certain property (the key) of the elements, where computing that property is an intensive operation that should be performed a minimal number of times. The Schwartzian Transform is notable in that it does not use named temporary arrays.

The idiom is named after Randal L. Schwartz
Randal L. Schwartz
Randal L. Schwartz , also known as merlyn, is an American author, system administrator and programming consultant.-Career:...

, who first demonstrated it in Perl
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...

 shortly after the release of Perl 5 in 1994. The term "Schwartzian Transform" applied solely to Perl programming for a number of years, but it has later been adopted by some users of other languages
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....

, such as Python
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...

, to refer to similar idioms in those languages. However, the algorithm was already in use in other languages (under no specific name) before it was popularized among the Perl community in the form of that particular idiom by Schwartz. The term "Schwartzian transform" indicates a specific idiom, and not the algorithm in general.

The Schwartzian Transform is a version of a Lisp
Lisp programming language
Lisp is a family of computer programming languages with a long history and a distinctive, fully parenthesized syntax. Originally specified in 1958, Lisp is the second-oldest high-level programming language in widespread use today; only Fortran is older...

 idiom known as decorate-sort-undecorate, which avoids recomputing the sort keys by temporarily associating them with the input items. This approach is similar to memoization
Memoization
In computing, memoization is an optimization technique used primarily to speed up computer programs by having function calls avoid repeating the calculation of results for previously processed inputs...

, which avoids repeating the calculation of the key corresponding to a specific input value. By comparison, this idiom assures that each input item's key is calculated exactly once, which may still result in repeating some calculations if the input data contains duplicate items.

The Perl idiom

The general form of the Schwartzian Transform is:


@sorted = map { $_->[0] }
sort { $a->[1] cmp $b->[1] }
map { [$_, foo($_)] }
@unsorted;


Where foo($_) represents an expression that takes $_ (each item of the list in turn) and produces the corresponding value that is to be compared in its stead.

Reading from right to left (or from the bottom to the top):
  • the original list @unsorted is fed into a map operation that wraps each item into a (reference to an anonymous 2-element) array consisting of itself and the calculated value that will determine its sort order (list of item becomes a list of [item=>value]);
  • then the list of lists produced by map is fed into sort, which sorts it according to the values previously calculated (list of [item, value] => sorted list of [item, value]);
  • finally, another map operation unwraps the values (from the anonymous array) used for the sorting, producing the items of the original list in the sorted order (sorted list of [item, value] => sorted list of item).


The use of anonymous arrays ensures that memory will be reclaimed by the Perl garbage collector immediately after the sorting is done.

Efficiency analysis

Without the Schwartzian transform, the sorting in the example above would be written in Perl like this:

@sorted = sort { foo($a) cmp foo($b) } @unsorted;


While it is shorter to code, the naive approach here could be much less efficient if the key function (called foo in the example above) is expensive to compute. This is because the code inside the brackets is evaluated each time two elements needs to be compared. An optimal comparison sort
Comparison sort
A comparison sort is a type of sorting algorithm that only reads the list elements through a single abstract comparison operation that determines which of two elements should occur first in the final sorted list...

 performs O(n log n) comparisons (where n is the length of the list), with 2 calls to foo every comparison, resulting in O(n log n) calls to foo. In comparison, using the Schwartzian transform, we only make 1 call to foo per element, at the beginning map stage, for a total of n calls to foo.

However, if the function foo is relatively simple, then the extra overhead of the Schwartzian transform may be unwarranted.

Example

For example, to sort a list of files by their modification times, a naive approach might be as follows:


function naiveCompare(file a, file b) {
return modificationTime(a) < modificationTime(b)
}

// Assume that sort(list, comparisonPredicate) sorts the given list using
// the comparisonPredicate to compare two elements.
sortedArray := sort(filesArray, naiveCompare)


Unless the modification times are memoized for each file, this method requires their re-computing every time a file is compared in the sort. Using the Schwartzian transform, the modification time is calculated only once per file.

A Schwartzian transform involves the functional idiom described above, which does not use temporary arrays.

The same algorithm can be written procedurally to better illustrate how it works, but this requires using temporary arrays, and is not a Schwartzian transform. The following example pseudo-code implements the algorithm in this way:


for each file in filesArray
insert array(file, modificationTime(file)) at end of transformedArray

function simpleCompare(array a, array b) {
return a[2] < b[2]
}

transformedArray := sort(transformedArray, simpleCompare)

for each file in transformedArray
insert file[1] at end of sortedArray

History

The first known online appearance of the Schwartzian Transform is a December 16, 1994 posting by Randal Schwartz to a thread in comp.unix.shell, crossposted to comp.lang.perl. (The current version of the Perl Timeline is incorrect and refers to a later date in 1995.) The thread began with a question about how to sort a list of lines by their "last" word:


adjn:Joshua Ng
adktk:KaLap Timothy Kwong
admg:Mahalingam Gobieramanan
admln:Martha L. Nangalama


Schwartz responded with:

  1. !/usr/bin/perl

require 5; # new features, new bugs!
print
map { $_->[0] }
sort { $a->[1] cmp $b->[1] }
map { [$_, /(\S+)$/] }
<>;


This code produces the result:


admg:Mahalingam Gobieramanan
adktk:KaLap Timothy Kwong
admln:Martha L. Nangalama
adjn:Joshua Ng


Schwartz noted in the post that he was "Speak[ing] with a lisp in Perl," a reference to the idiom's Lisp origins.

The term "Schwartzian Transform" itself was coined by Tom Christiansen in a followup reply. Later posts by Christiansen made it clear that he had not intended to name the construct, but merely to refer to it from the original post: his attempt to finally name it "The Black Transform" did not take hold ("Black" here being a pun on "schwar[t]z", which means black in German).

Comparison to other languages

Some other languages provide a convenient interface to the same optimization as the Schwartzian transform:
  • In Python
    Python (programming language)
    Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...

     2.4 and above, both the sorted function and the in-place list.sort method take a key= parameter that allows the user to provide a "key function" (like foo in the examples above). In Python 3 and above, use of the key function is the only way to specify a custom sort order (the previously-supported comparator argument was removed).
  • In Ruby
    Ruby (programming language)
    Ruby is a dynamic, reflective, general-purpose object-oriented programming language that combines syntax inspired by Perl with Smalltalk-like features. Ruby originated in Japan during the mid-1990s and was first developed and designed by Yukihiro "Matz" Matsumoto...

     1.8.6 and above, the Enumerable abstract class (which includes Arrays) contains a sort_by method which allows you to specify the "key function" (like foo in the examples above) as a code block.
  • In D
    D (programming language)
    The D programming language is an object-oriented, imperative, multi-paradigm, system programming language created by Walter Bright of Digital Mars. It originated as a re-engineering of C++, but even though it is mainly influenced by that language, it is not a variant of C++...

     2 and above, the schwartzSort function is available. It might require less temporary data and be faster than the Perl idiom or the decorate-sort-undecorate idiom present in Python and Lisp. This is because sorting is done in-place and only minimal extra data (one array of transformed elements) is created.
  • Racket's core sort function accepts a #:key keyword argument with a function that extracts a key, and an additional #:cache-keys? requests that the resulting values are cached during sorting. For example, a convenient way to shuffle a list is (sort l < #:key (λ (_) (random)) #:cache-keys? #t).

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK