Friday, November 19, 2010

Boxing and Unboxing

Boxing is a boring sport; but in the exciting and melodramatic world of programming, boxing is the process of converting a value type to a reference type, or converting a basic datatype (like an integer) to an object.

Unboxing is the opposite of boxing; that is to convert an object to a value type.
Here is an example right off my head:
boxing:
int x = 5;
object k = x; //or object k = (object) x for explicit boxing

Unboxing:
int z;
z= k // k is our boxed type

While boring real life boxing can make you look like a douche (go for Jujitsu instead), boxing in the programming realm can make you look like a life-saving wizard, especially when working on managed programming languages.

 

 

You're doing it wrong.

 

 

Boxing is needed when a parameter of a function that you need to call is an object, while you need to send a trivial datatype. Want a simple example? Sometimes you need to pass an int value as a string, so you use the ToString() function.

A complex example of boxing usage (besides beating people up) involves boxing trivial types as objects in order to be referenced as delegates.

 Now, what does this have to do with efficiency, you say? Well, boxing is evil as a programming practice, and should be used sparingly. When you box an object, the memory usage can be 20 times that of the trivial datatype, and the process of boxing and unboxing wastes a considerable amount of CPU cycles as opposed to handling trivial datatypes.

 

"Convert THIS to String!"

  

 

In the old days of .NET, people had to use boxing a lot because of ArrayLists, but that is no longer needed as you can use generic collections. 

The Illusion of Performance Hacks

Don't try a performance tip until you learn how your compiler of choice works against that type of optimization; you might be another victim of premature optimization which, according to Donald Knuth; is the source of all evil. (In addition to teletubbies, they scare the living hell out of me)

I'll give you an example:
The XOR swap algorithm (really interesting stuff here) is a method to swap values of two variables without using a third temporary variable! Great, we just saved the overhead incurred by creating a third variable, right? RIGHT?



Well, I offer my deep condolences, but not in all cases; actually not in most cases. Modern compilers are very very good at optimizations themselves, and so using hacks like that can defeat the compiler's good optimization techniques and rendering it useless.
In our case, XOR swapping is done like this:

char a = 'A';
char b = 'B';
a ^=b; //<-- this means a = a^b;
b^=a;
a^=b;

The result is the values are swapped between the two variables in the same number of operations without using a third variable, which sounds as a good performance catch...

... Except the compiler already does its own optimization whenever it encounters an algorithm that it finds to be a swapping algorithm by using the same number of registers used as in XOR swap; all while performing the swap operation using the registers faster than going through the ALU for the XOR operation! Actually some CPU architectures can support internal swapping (simply exchanging memory addresses of variables) or, if supported, using a single
XCHG (exchange) instruction
. Yes, compilers are cool like that.

Another reason why the XOR swap might be bad optimization is that modern processors work on pipelines but the XOR operation flow forces the processor to perform XORing sequentially because each step in XOR swap depends on the previous one.

If you use XOR swap instead of regular swap, the compiler will not be able to use its own optimized swap. 

There is another reason that will surely result in programmers pulling their hair out if they did use the XOR swap trick. If you tried to XOR-swap two variables with the same value; you will lose data! XORing something with itself equals zero! Traditional swap doesn't have this problem.

Another similar hack is try swapping without using a third variable is by using:

a =a-b;
b = a+b;
a = -1*(a-b);

Which still suffers from the same problem. The reality is that the compiler is a lot better than most programmers at optimization, and that the best approach before you implement an optimization trick is to research it first, see how the compiler tries to optimize such a case, and then some sweet, sweet benchmarking. 

 

To cut the chase for you, unless you are experiencing extreme register spills (CPU register shortages) or developing for microcontrollers and smart cards with very limited RAM, you should use the regular swap algorithm.

Assembly language nuts might respond by reminding us that the assembly instruction xor is often used to zero out a register, and they are correct. To zero out a register, it is more efficient (especially in code size) to xor it with itself than to use a mov instruction, and it is especially useful for writing shellcodes since you won't have to write null bytes that break the vulnerable string functions that you may use to inject the shellcode. 

 

But we're not talking about zeroing out registers here, are we? We're talking about swapping. The two cases are disjoint and unrelated.

 

Trivia: Can you tell why xor is used to zero out a register instead of subtracting it by itself with a sub instruction?


Cracking SHA-1 and the Game of Tom & Jerry

Let me start bluntly by saying that if you use md4/md5 hashes, then you're faint of heart and this post is definitely not for you.

SHA-1 is a big improvement over MD5 in terms of collisions, and hashes generated using it are much harder to crack than MD5 hashes.

Unfortunately that is about to change dramatically in a way that makes it very easy for attackers to crack SHA-1 hashes, while making it more difficult for victims to 'get on with the times' and use safer alternatives, what which fragile web app architecture and poor, poor extensibility in widely-used web apps.

I'm talking about how easy it has become now to crack a SHA-1 hash using the cloud. Amazon now provides EC2 Clusters with GPU powers. Yes, rental of machines with superior GPU-computation abilities, and guess which pieces of hardware are extremely good at cracking hashes? Yup, GPUs.

If I'm not mistaken, it goes as cheap as 2 dollars for an hour of GPU-hotness, which is more than enough to crack a certain hash that's driving you crazy.

So what is the solution you might ask? Here is where the proverbial game of cat and mouse comes into play. SHA-1 hashes were considered secure because it is difficult for an attacker to brute-force the inverse of the hash; that is reverse the operation of hashing to find the original text (in this case password plaintext), but now with the 'increased' use of the cloud and perhaps many botnet herders wishing to utilize the GPUs of their zombies and bots, the effort needed by the SHA-1 cracker is well within his/her grasp.

So you need to up the ante. You need to make it even more harder to crack your hashes while still making it reasonable and affordable to create the hash yourself. You don't want the user waiting for three minutes while his password is being hashes, unless you're The BOFH, which is fine, but let's not stray far from the point:
Use algorithms that put more work in creating the hash and therefore orders of magnitude more work for the cracker and his cloud. Things like PBKDF2 or scrypt (To be honest I haven't tried scrypt yet so don't blame me for anything wrong with it). PBKDF2 is already supported in many web application frameworks like .NET, so for many applications the migration should not be THAT much difficult.

To make a long story short, PBKDF2 works by taking a key(In this case a password), a salt (No, not NaCl, get away from my blog, chemists!) and an iteration count to derive a key (In our case the final hash value). This differs in that you now have to go through the entire iteration count in order to derive one hash from one plaintext, which is obviously much harder to compute.


Using PBKDF2 or scrypt is not a panacea, however, because you still need to do it correctly.  For example, you shouldn't even think about using an iteration count of less than 2000 (some say the minimum is 1000), but many systems use iteration counts of up to 10,000, which is delicious. Have you heard about the BlackBerry backup data encryption cracked a few months ago? They used one iteration count, effectively rendering all advantages of PBKDF2 (stands for Password Based Key Derivation Function 2, btw) utterly useless.

If your business case permits, you don't even need to disclose the salt and iteration count to add a bit more obscurity to your security (add, not replace, mind you; obscurity is not a replacement for actual security).

Oh, and that's just one great usage of the cloud to crack passwords or maybe perform pen-tests (haha), see this example of cracking WPA passwords using the cloud: WPA Cracker.

That's it for now, for more info you can read about Amazon's GPU clusters or StackSmashing's coverage of this interesting topic.

Thursday, November 18, 2010

Efficiency of Increment and Decrement Operators

By the name of God, most merciful and most gracious, I will inaugurate the first post of this blog ever.

As suggested by the title above, this post will discuss, albeit briefly, a little efficiency factor seldom optimized by programmers and engineers everywhere: The Increment (++) and Decrement (--) operators.

int x=1;
x++;
++x;

Both of these forms of incrementing the value of the variable x do act correctly, but is there any difference?

The way x++ works is the following: create temporary x, increment temporary x, return temporary x.
As for ++x: increment x, return x


Note that this tip concerns standalone ++ operator calls

If the ++ operator is used for trivial types such as int and double, then the performance will not take a hit if x++ is used instead of ++x.

But for non-trivial types, like bigger objects, the ++ operator can differ a lot. A certain object that overrides the ++ operator might be large in size, and so creating a copy of it might hurt performance. In such case using ++x is better.

Embedded systems and mobile platforms are devices that can benefit a lot from good use of the ++ operator.