Code Breaker: August 2009

8/27/2009

VRIDGE X100 - PCI Express Expander

1 PCI Express Host Card - 4 PCI Express slot board

Cost - $5,000

1. CASE

2. Host Card, Expander Board

3. Detail

chip-set	Lucid HYDRA 100
external connecter	PCI-Express x16 Connecter
power connecter	ATX 24pin power connecter ATX 8pin power connecter
power	4W (only chip)
bus connecter	PCI-Express x16 slot
PCI-Express slot	4
OS	Windows XP (32bit / 64bit) Windows Vista (32bit / 64bit) Linux
size	223.8mm×264.2mm
size

external connecter	PCI-Express x16 connecter
bus	PCI-Express x16
size	131mm×69mm×14mm
size

http://www.elsa-jp.co.jp/english/products/pes/vridge_x100_dual16/index.html

IBM, Power 7 Processor

IBM announced Power7 processor at the Hot Chips 21.

45-nm process, up to eight cores supporting 32 threads, 32-way server can be used to collect the 256 cores with 1024 threads. 32MB eDRAM cache, dual 4-channel DDR3 memory controller, memory bandwidth of 300GB / s or higher.
Insight64 analyst, Nathan Brookwood explained power 7 is one of the fastest CPU, and some items would exceed the Intel nehalremeul.

Power 7 is scheduled to be released officially next year.

IBM, "POWER7: IBM's Next Generation POWER Microprocessor", Hot Chips 21, Stanford Univ.

Power7 Spec. : 45nm, 8 cores, 32 threads, 32MB eDRAM cache, dual 4 channel DDR3, 300 GB/s Memory bandwidth.

http://www.hotchips.org/hc21/program/conference_day_two.htm

8/19/2009

Pico Computing : FPGA cluster

SC3 E-16 SuperCluster

Up to 77 E-16LX50 Cards
Up to 3.5 Million Logic Cells
4U Rackmount Chassis
Dual Intel Quad Core Xenon Processor
8GB RAM
Dual Hard Drives
DVD-RW
Linux

EX-160 E-16 Backplane

Shown with 7 E-16LX50 FPGA Cards
Full Size x8 PCIe Backplane
Holds up to 7 E-16LX50 FPGA Cards
Up to 390,000 Logic Cells
Power: Externally Powered by a Standard 2x3 PCIe Power Connector
Power Consumption: Approximatley 30 Watts Max

PICO E-16 (ExpressCard/34 Format)

Virtex-5 LX50 FPGA
32MB PSRAM
PLX PEX8311 PCI Express Bridge
x1 PCI Express Lane
JTAG for Debugging
RoHS Complaint

SC4 EX-300 SuperCluster

Up to 7 EX-300 Boards
Up to 8.3 Million Logic Cells
4U Rackmount Chassis
Dual Intel Quad Core Xeon Processor
8GB Ram
Dual Hard Drives
DVD-RW
Linux

EX-300

Full Size x1 PCIe Board
16 Xilinx Spartan XC3S5000 FPGAs
1.3 Million Logic Cells
Power: Externally Powered by a Standard 2x3 PCIe Power Connector
Power Consumption: Approximatley 75 Watts Max

[Link]

Pico - http://www.picocomputing.com/

8/18/2009

RAR Crack Software Benchmark

[Current Best]

igrargpu_v0.4 - Radeon HD 4870 : 2800 pw/sec (pw len=8)

[Link]

Crark - http://www.crark.net/
Parallel Password Recovery - http://www.parallelrecovery.com/rar-password.html
igrargpu - http://www.golubev.com/rargpu.htm
Advanced Archive Password Recovery - http://www.elcomsoft.com/archpr.html

Bruteforce Software List

Archives ACE ARJ RAR ZIP

MS Office Access OneNote Outlook Project VBA Word/Excel 97/2000 (40 bit) Word/Excel/PowerPoint 2007 Word/Excel/PowerPoint XP/2003 (40 - 128 bit)

Other Office Lotus Smartsuite OpenOffice.org PDF WordPerfect

Microsoft Products Mail Money Schedule+ Backup

Windows Asterisked Dial-up Remote Desktop

Operational Systems Novell Netware UNIX/Linux Windows 95/98/ME Windows NT/2000/XP Windows shares Windows Vista

Databases Clarion Oracle Paradox& Borland Database Engine SQL

BIOS AMI AWARD Most BIOSes

E-mail clients Calypso Eudora Most e-mail clients Outlook Express The Bat! Thunderbird

FTP clients FAR Cute FTP Most FTP clients Total Commander

Instant Messengers AOL ICQ Most instant messengers MSN

Browsers Google Chrome Internet Explorer Mozilla Firefox Opera

Business Lotus Notes QuickBooks

Web Authoring cfCrypt/cfEncode Windows Script Encoder

Disk/File Encryption BestCrypt EFS/NTFS Icon Lock-It 2000 Norton Secret Stuff Package for the Web (PFTW) PGP PowerPacker decryptor

Hardware Cisco IOS

Rarity AIN Edialer MS Office 95 RAR 1.5

Hashes MD4 MD5 SHA/SHA1

[Link]

Russian Password Cracker - http://www.password-crackers.com/crack.html

FSE 2009 : How Fast is AES?

FSE 2009 : 'How Fast is AES? ' by Emilia Kasper

[Intel]

using AES-NI, AES CBC speed : 70 cycles/block

using AES-NI in parallel mode, 2010' AES speed : 12 cycles/block

[Bernstein]

Asm in AMD Athlon 64 X2 3800+ : 166.88 cycles/block

[Emilia Kasper]

bitslice mode in Intel Core 2 Quad Q9550 : 129.6 cycles/block

[Emilia Kasper]

AES GCM mode in Intel Core 2 Quad Q9550 : 184 cycles/block

[Reference]

FSE 2009 : How Fast is AES? - http://fse2009rump.cr.yp.to/6f39b999acecf5fc659519df76dc01d3.pdf

8/17/2009

Intel : Advanced Encryption Standard (AES) Instructions Set White Paper

Using AES-NI with Parallel Modes of Operation

This chapter explains how throughput can be enhanced with AES using parallel modes of operation. Consider the code snippet described in Figure 7, for encrypting AES-128 in ECB mode. In this example, there are 8 data blocks in xmm2-xmm9, and a Round Key is loaded into xmm1. For each round, 8 AES round instructions are dispatched, operating on the 8 data blocks with the same Round Key. Then, the next round key is loaded. The 8 blocks encryption results are eventually stored into memory, ready load a new set of 8 data blocks. This way, the program encrypts 8 data blocks simultaneously, but the order is different from the order shown in the previous chapters. Instead of completing the encryption of one block and then continuing to the next block, the code computes one AES round on all 8 blocks, using one Round Key, and then continues to the next round (using the next Round Key). This “loop-reversal” technique is applicable to any parallel mode of operation such as CTR and CBC decrypt (but not for CBC encrypt). The underlying fully-pipelined hardware implies that AES instructions could be dispatched “each cycle” if data is available. In a parallel mode of operation, using the AES instructions and loop-reversing software, data can indeed be made available in (almost) every cycle.

The following rough performance estimate illustrates the performing gain (neglecting loads and stores). Suppose that the latency of the AES instructions is L cycles, and L ≤ 8 (the actual latency of the AES instructions is L=6 cycles and this example uses 8 xmm registers). Then, encryption of the 8 data blocks would be completed after (roughly) 88+L cycles (pxor is done within 1 cycle). Therefore, the obtained throughput is around (88+6)/8=12 cycles per block (16B), which approaches the theoretical throughput limit. This simplified estimate ignores several factors (e.g., loads/stores), but nevertheless, the measured effect is quite close to the estimated one.

[result]

Intel nehalem CPU 3Ghz 4Core - 1000M AES/Sec

[Link]

AES Instruction Set - http://softwarecommunity.intel.com/isn/downloads/intelavx/AES-Instructions-Set_WP.pdf

AES SPEED

[Link]

AES Speed by D. J. Bernstein - http://cr.yp.to/aes-speed.html
AES Source Code by D. J. Bernstein - http://www.ecrypt.eu.org/stream/svn/viewcvs.cgi/ecrypt/trunk/benchmarks/aes-ctr/aes-128/?rev=216
Fast implementations by Helger Lipmaa - http://home.cyber.ee/helger/implementations/
Dag Arne Osvik - http://www.ii.uib.no/~osvik/
Hongjun - http://icsd.i2r.a-star.edu.sg/staff/hongjun/
Gladman - http://fp.gladman.plus.com/cryptography_technology/rijndael/
The eSTREAM Project - http://www.ecrypt.eu.org/stream/
Emilia Käsper - http://homes.esat.kuleuven.be/~ekasper/
AES Source Code by Emilia Käsper - http://homes.esat.kuleuven.be/~ekasper/code/aes-ctr.tar.bz2
Peter Schwabe - http://cryptojedi.org/users/peter/
AES Paper by Daniel J. Bernstein, Peter Schwabe: New AES software speed records - http://cryptojedi.org/papers/aesspeed-20080926.pdf
AES Instruction Set - http://softwarecommunity.intel.com/isn/downloads/intelavx/AES-Instructions-Set_WP.pdf

AES Calculator

You can use the AES Calculator applet displayed below to encrypt or decrypt

using AES the specified 128-bit (32 hex digit) data value with the
128/192/256-bit (32/48/64 hex digit) key, with a trace of the calculations.
Some example values which may be used are given below.

Example AES test values (taken from FIPS-197) are:
Key: 000102030405060708090a0b0c0d0e0f
Plaintext: 00112233445566778899aabbccddeeff
Ciphertext: 69c4e0d86a7b0430d8cdb78070b4c55a

Key: 000102030405060708090a0b0c0d0e0f1011121314151617
Plaintext: 00112233445566778899aabbccddeeff
Ciphertext: dda97ca4864cdfe06eaf70a0ec0d7191

Key: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f
Plaintext: 00112233445566778899aabbccddeeff
Ciphertext: 8ea2b7ca516745bfeafc49904b496089

Encrypting the plaintext with the key should give the ciphertext,
decrypting the ciphertext with the key should give the plaintext.

[Link]

AES Calculator - http://www.unsw.adfa.edu.au/~lpb/src/AEScalc/AEScalc.html
AES applet - http://www.unsw.adfa.edu.au/~lpb/src/AEScalc/AEScalc.jar
JAVA Applet (download - http://java.com/en/download/)

8/16/2009

Welcome to Chip Architect

[Link]

CHIP-ARCHITECT PUBLICATIONS - http://www.chip-architect.com/
Understanding the detailed Architecture of AMD's 64 bit Core - http://www.chip-architect.com/news/2003_09_21_Detailed_Architecture_of_AMDs_64bit_Core.html

COPACOBANA : FPGA based DES Cracker

[Detail Spec.]

Xilinx XC3S1000 - 120 FPGAs, 136Mhz
2^56 / 12.8 day = 65,156 M/sec
Cost : $10,000 (real cost <$100,000)

[link]

COPACOBANA - http://www.copacobana.org/

FPGA based SHA1 implementation

[1] "Optimizing SHA-1 Hash Function for High Throughput with a Partial Unrolling Study", Integrated Circuit and System Design, 2005

the throughput : about 3,000 Mbps
http://www.springerlink.com/content/aq68x01g17teeaa5/fulltext.pdf

[2] "Throughput Optimized SHA-1 Architecture Using Unfolding Transformation", Application-specific Systems, Architectures and Processors, 2006. ASAP '06.
According to the results of FPGA implementations, 3,541 Mbps with a pipeline
synthesis results using 0.18mum CMOS technology showed that 10.4 Gbps with a pipeline
http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4019540

3,541 Mbps = 22M/sec
10.4 Gbps = 65M/sec

NSA@Home : FPGA-based SHA-1 and MD5 bruteforce cracker

NSA@home is a fast FPGA-based SHA-1 and MD5 bruteforce cracker. It is capable of searching the full 8-character keyspace (from a 64-character set) in about a day in the current configuration for 800 hashes concurrently, using about 240W of power. This performance is equivalent to over 1500 Athlon FX-60 CPUs, which would take about 250kW.

[Detail Spec.]

core chips : 15 Virtex-II Pro (XC2VP20) FPGAs
control chips : 3 Spartan-II (XC2S50) FPGAs
DSP : 1 ADSP21160M (which probably calculated transform parameters)
Size : 1u case
Power : about 120W while operating with 6 fans
Speed : 2^(6 * 8)=2^48 = 256,000,000M/day = 3000M/sec

[good choice]

SHA1 best 'ATI Radeon HD 4870 X2 - IGHASHGPU' : 640M/sec, 250W, $420
MD5 best 'ATI Radeon HD 4870 X2 - IGHASHGPU' : 2400M/sec, 250W, $420
SHA1/MD5 NSA@Home : 3000M/sec, 120W, $??,000

[Link]

NSA@Home - http://nsa.unaligned.org/
The complete SHA-1 chip Verilog source can be found here.
The MD5 chip uses most of the files from the SHA-1 one, and the new hash & toplevel is here.
Spartan-II USB interface sources are here