In computer science, source code is text written using the format and syntax of the programming language that it is being written in. Such a language is specially designed to facilitate the work of computer programmers, who specify the actions to be performed by a computer mostly by writing source...
Machine code or machine language is a system of impartible instructions executed directly by a computer's central processing unit. Each instruction performs a very specific task, typically either an operation on a unit of data Machine code or machine language is a system of impartible instructions...
that has been made difficult to understand for humans. Programmers may deliberately obfuscate
Obfuscation
Obfuscation is the hiding of intended meaning in communication, making communication confusing, wilfully ambiguous, and harder to interpret.- Background :Obfuscation may be used for many purposes...
Security through obscurity is a pejorative referring to a principle in security engineering, which attempts to use secrecy of design or implementation to provide security...
Reverse engineering is the process of discovering the technological principles of a device, object, or system through analysis of its structure, function, and operation...
A puzzle is a problem or enigma that tests the ingenuity of the solver. In a basic puzzle, one is intended to put together pieces in a logical way in order to come up with the desired solution...
or recreational challenge for someone reading the source code. Programs known as obfuscators transform readable code into obfuscated code using various techniques.
Overview
The architecture and characteristics of some languages may make them easier to obfuscate than others. C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...
are some examples of languages easy to obfuscate.
Recreational obfuscation
Writing and reading obfuscated source code can be a brain teaser
Brain teaser
A brain teaser is a form of puzzle that requires thought to solve. It often requires thinking in unconventional ways with given constraints in mind; sometimes it also involves lateral thinking. Logic puzzles and riddles are specific types of brain teasers....
The International Obfuscated C Code Contest is a programming contest for the most creatively obfuscated C code. It was held annually between 1984 and 1996, and thereafter in 1998, 2000, 2001, 2004, 2005 and 2006....
The Obfuscated Perl Contest was a competition for programmers of Perl which was held annually between 1996 and 2000. Entrants to the competition aimed to write "devious, inhuman, disgusting, amusing, amazing, and bizarre Perl code"...
Types of obfuscations include simple keyword substitution, use or non-use of whitespace to create artistic effects, and self-generating or heavily compressed programs.
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...
A signature block is a block of text automatically appended at the bottom of an e-mail message, Usenet article, or forum post. This has the effect of "signing off" the message and in a reply message of indicating that no more response follows...
Just another Perl hacker, or JAPH, typically refers to a Perl program which prints "Just another Perl hacker," . Short JAPH programs are often used as signatures in online forums, or as T-shirt designs...
").
Examples
This is a winning entry from the International Obfuscated C Code Contest written by Ian Phillipps in 1988 and subsequently reverse engineered by Thomas Ball.
/*
LEAST LIKELY TO COMPILE SUCCESSFULLY:
Ian Phillipps, Cambridge Consultants Ltd., Cambridge, England
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
"The Twelve Days of Christmas" is an English Christmas carol that enumerates a series of increasingly grand gifts given on each of the twelve days of Christmas. Although first published in England in 1780, textual evidence may indicate the song is French in origin...
. It contains all the strings required for the poem in an encoded form within the code.
A non-winning entry from the same year, the next example illustrates creative use of whitespace; it generates mazes of arbitrary length:
char*M,A,Z,E=40,J[40],T[40];main(C){for(*J=A=scanf(M="%d",&C);
-- E; J[ E] =T
[E ]= E) printf("._"); for || (printf("\n|"
) , A = 39 ,C --
) ; Z || printf (M ))M[Z]=Z[A-(E =A[J-Z])&&!C
& A T[ A]
|6<<27
Modern C compilers don't allow constant strings to be overwritten, which can be avoided by changing "*M" to "M[3]" and omitting "M=".
The following example by Óscar Toledo Gutiérrez, Best of Show entry in the 19th IOCCC, implements a 8080 emulator complete with terminal and disk controller, capable of booting CP/M-80 and running CP/M applications,
include
#define n(o,p,e)=y=(z=a(e)%16 p x%16 p o,a(e)p x p o),h(
#define s 6[o]
#define p z=l[d(9)]|l[d(9)+1]<<8,1<(9[o]+=2)||++8[o]
#define Q a(7)
#define w 254>(9[o]-=2)||--8[o],l[d(9)]=z,l[1+d(9)]=z>>8
#define O )):((
#define b (y&1?~s:s)>>"\6\0\2\7"[y/2]&1?0:(
#define S )?(z-=
#define a(f)*((7&f)-6?&o[f&7]:&l[d(5)])
#define C S 5 S 3
#define D(E)x/8!=16+E&198+E*8!=x?
#define B(C)fclose((C))
#define q (c+=2,0[c-2]|1[c-2]<<8)
#define m x=64&x?*c++:a(x),
#define A(F)=fopen((F),"rb+")
unsigned char o[10],l[78114],*c=l,*k=l
#define d(e)o[e]+256*o[e-1]
for(v A((u A((e A((r-2?0:(V A(1[U])),"C")
),system("stty raw -echo min 0"),fread(l,78114,1,e),B(e),"B")),"A")); 118-(x*c++); (y=x/8%8,z=(x&199)-4 S 1 S 1 S 186 S 2 S 2 S 3 S 0,r=(y>5)*2+y,z=(x&
207)-1 S 2 S 6 S 2 S 182 S 4)?D(0)D(1)D(2)D(3)D(4)D(5)D(6)D(7)(z=x-2 C C C C
C C C C+129 S 6 S 4 S 6 S 8 S 8 S 6 S 2 S 2 S 12)?x/64-1?((0 O a(y)=a(x) O 9
[o]=a(5),8[o]=a(4) O 237*c++?((int (*))(2-*c++?fwrite:fread))(l+*k+1[k]*
256,128,1,(fseek(y=5[k]-1?u:v,((3[k]|4[k]<<8)<<7|2[k])<<7,Q=0),y)):0 O y=a(5
),z=a(4),a(5)=a(3),a(4)=a(2),a(3)=y,a(2)=z O c=l+d(5) O y=l[x=d(9)],z=l[++x]
,x[l]=a(4),l[--x]=a(5),a(5)=y,a(4)=z O 2-*c?Z||read(0,&Z,1),1&*c++?Q=Z,Z=0:(
Q=!!Z):(c++,Q=r=V?fgetc(V):-1,s=s&~1|r<0) O++c,write(1,&7[o],1) O z=c+2-l,w,
c=l+q O p,c=l+z O c=l+q O s^=1 O Q=q[l] O s|=1 O q[l]=Q O Q=~Q O a(5)=l[x=q]
,a(4)=l[++x] O s|=s&16|9159?Q+=96,1:0,y=Q,h(s<<8)
O l[x=q]=a(5),l[++x]=a(4) O x=Q%2,Q=Q/2+s%2*128,s=s&~1|x O Q=l[d(3)]O x=Q /
128,Q=Q*2+s%2,s=s&~1|x O l[d(3)]=Q O s=s&~1|1&Q,Q=Q/2|Q<<7 O Q=l[d(1)]O s=~1
&s|Q>>7,Q=Q*2|Q>>7 O l[d(1)]=Q O m y n(0,-,7)y) O m z=0,y=Q|=x,h(y) O m z=0,
y=Q^=x,h(y) O m z=Q*2|2*x,y=Q&=x,h(y) O m Q n(s%2,-,7)y) O m Q n(0,-,7)y) O
m Q n(s%2,+,7)y) O m Q n(0,+,7)y) O z=r-8?d(r+1):s|Q<<8,w O p,r-8?o[r+1]=z,r
[o]=z>>8:(s=~40&z|2,Q=z>>8) O r[o]--||--o[r-1]O a(5)=z=a(5)+r[o],a(4)=z=a(4)
+o[r-1]+z/256,s=~1&s|z>>8 O ++o[r+1]||r[o]++O o[r+1]=*c++,r[o]=*c++O z=c-l,w
,c=y*8+l O x=q,b z=c-l,w,c=l+x) O x=q,b c=l+x) O b p,c=l+z) O a(y)=*c++O r=y
,x=0,a(r)n(1,-,y)s<<8) O r=y,x=0,a(r)n(1,+,y)s<<8))));
system("stty cooked echo"); B((B((V?B(V):0,u)),v)); }
Just another Perl hacker, or JAPH, typically refers to a Perl program which prints "Just another Perl hacker," . Short JAPH programs are often used as signatures in online forums, or as T-shirt designs...
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...
At best, obfuscation merely makes it time-consuming, but not impossible, to reverse engineer a program. In Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...
it also limits the use of the Reflection API on the obfuscated code.
Obfuscating software
A variety of tools exists to perform or assist with code obfuscation.
These include experimental research tools created by academics, hobbyist tools,
commercial products written by professionals, and open-source software
Open-source software
Open-source software is computer software that is available in source code form: the source code and certain other rights normally reserved for copyright holders are provided under a software license that permits users to study, change, improve and at times also to distribute the software.Open...
.
There also exist deobfuscation tools that attempt to perform the reverse
transformation.
Although the majority of commercial obfuscation solutions work by transforming
either program source
code, or platform-independent bytecode as used by
Java and
.NET, there are also some that work with C and
C++ - languages that are typically compiled to native code.
The AARD code was a segment of obfuscated machine code that is included in several executables, including the installer and WIN.COM, in a beta release of Microsoft Windows 3.1. It was a block of code which was XOR encrypted, self-modifying, and deliberately obfuscated, using various undocumented...
ActionScript code protection. ActionScript is the main language for developing flash products.Code obfuscation is the process of transforming code into a form that is unintelligible to human...
An esoteric programming language is a programming language designed as a test of the boundaries of computer programming language design, as a proof of concept, or as a joke...
In computer terminology, polymorphic code is code that uses a polymorphic engine to mutate while keeping the original algorithm intact. That is, the code changes itself each time it runs, but the function of the code will not change at all...
Hardware obfuscation is a technique by which the description or the structure of electronic hardware is modified to intentionally conceal its functionality, which makes it significantly more difficult to reverse-engineer...