VRIDGE X100 - PCI Express Expander

1 PCI Express Host Card - 4 PCI Express slot board
Cost - $5,000


2. Host Card, Expander Board

3. Detail

chip-set Lucid HYDRA 100
external connecter PCI-Express x16 Connecter
power connecter ATX 24pin power connecter
ATX 8pin power connecter
power 4W (only chip)
bus connecter PCI-Express x16 slot
PCI-Express slot 4
OS Windows XP (32bit / 64bit)
Windows Vista (32bit / 64bit)
size 223.8mm×264.2mm

external connecter PCI-Express x16 connecter
bus PCI-Express x16
size 131mm×69mm×14mm


IBM, Power 7 Processor

IBM announced Power7 processor at the Hot Chips 21.
45-nm process, up to eight cores supporting 32 threads, 32-way server can be used to collect the 256 cores with 1024 threads. 32MB eDRAM cache, dual 4-channel DDR3 memory controller, memory bandwidth of 300GB / s or higher.
Insight64 analyst, Nathan Brookwood explained power 7 is one of the fastest CPU, and some items would exceed the Intel nehalremeul.
Power 7 is scheduled to be released officially next year.

IBM, "POWER7: IBM's Next Generation POWER Microprocessor", Hot Chips 21, Stanford Univ.
Power7 Spec. : 45nm, 8 cores, 32 threads, 32MB eDRAM cache, dual 4 channel DDR3, 300 GB/s Memory bandwidth.



Pico Computing : FPGA cluster

SC3 E-16 SuperCluster
  • Up to 77 E-16LX50 Cards
  • Up to 3.5 Million Logic Cells
  • 4U Rackmount Chassis
  • Dual Intel Quad Core Xenon Processor
  • 8GB RAM
  • Dual Hard Drives
  • DVD-RW
  • Linux

EX-160 E-16 Backplane
  • Shown with 7 E-16LX50 FPGA Cards
  • Full Size x8 PCIe Backplane
  • Holds up to 7 E-16LX50 FPGA Cards
  • Up to 390,000 Logic Cells
  • Power: Externally Powered by a Standard 2x3 PCIe Power Connector
  • Power Consumption: Approximatley 30 Watts Max

PICO E-16 (ExpressCard/34 Format)
  • Virtex-5 LX50 FPGA
  • 32MB PSRAM
  • PLX PEX8311 PCI Express Bridge
  • x1 PCI Express Lane
  • JTAG for Debugging
  • RoHS Complaint

SC4 EX-300 SuperCluster
  • Up to 7 EX-300 Boards
  • Up to 8.3 Million Logic Cells
  • 4U Rackmount Chassis
  • Dual Intel Quad Core Xeon Processor
  • 8GB Ram
  • Dual Hard Drives
  • DVD-RW
  • Linux

  • Full Size x1 PCIe Board
  • 16 Xilinx Spartan XC3S5000 FPGAs
  • 1.3 Million Logic Cells
  • Power: Externally Powered by a Standard 2x3 PCIe Power Connector
  • Power Consumption: Approximatley 75 Watts Max



RAR Crack Software Benchmark

[Current Best]
  • igrargpu_v0.4 - Radeon HD 4870 : 2800 pw/sec (pw len=8)

Bruteforce Software List

Microsoft Products Mail Money Schedule+ Backup
Instant Messengers AOL ICQ Most instant messengers MSN
Hardware Cisco IOS


FSE 2009 : How Fast is AES?

FSE 2009 : 'How Fast is AES? ' by Emilia Kasper

using AES-NI, AES CBC speed : 70 cycles/block
using AES-NI in parallel mode, 2010' AES speed : 12 cycles/block

Asm in AMD Athlon 64 X2 3800+ : 166.88 cycles/block

[Emilia Kasper]
bitslice mode in Intel Core 2 Quad Q9550 : 129.6 cycles/block

[Emilia Kasper]
AES GCM mode in Intel Core 2 Quad Q9550 : 184 cycles/block



Intel : Advanced Encryption Standard (AES) Instructions Set White Paper

Using AES-NI with Parallel Modes of Operation

This chapter explains how throughput can be enhanced with AES using parallel modes of operation. Consider the code snippet described in Figure 7, for encrypting AES-128 in ECB mode. In this example, there are 8 data blocks in xmm2-xmm9, and a Round Key is loaded into xmm1. For each round, 8 AES round instructions are dispatched, operating on the 8 data blocks with the same Round Key. Then, the next round key is loaded. The 8 blocks encryption results are eventually stored into memory, ready load a new set of 8 data blocks. This way, the program encrypts 8 data blocks simultaneously, but the order is different from the order shown in the previous chapters. Instead of completing the encryption of one block and then continuing to the next block, the code computes one AES round on all 8 blocks, using one Round Key, and then continues to the next round (using the next Round Key). This “loop-reversal” technique is applicable to any parallel mode of operation such as CTR and CBC decrypt (but not for CBC encrypt). The underlying fully-pipelined hardware implies that AES instructions could be dispatched “each cycle” if data is available. In a parallel mode of operation, using the AES instructions and loop-reversing software, data can indeed be made available in (almost) every cycle.

The following rough performance estimate illustrates the performing gain (neglecting loads and stores). Suppose that the latency of the AES instructions is L cycles, and L ≤ 8 (the actual latency of the AES instructions is L=6 cycles and this example uses 8 xmm registers). Then, encryption of the 8 data blocks would be completed after (roughly) 88+L cycles (pxor is done within 1 cycle). Therefore, the obtained throughput is around (88+6)/8=12 cycles per block (16B), which approaches the theoretical throughput limit. This simplified estimate ignores several factors (e.g., loads/stores), but nevertheless, the measured effect is quite close to the estimated one.

  • Intel nehalem CPU 3Ghz 4Core - 1000M AES/Sec



AES Calculator

You can use the AES Calculator applet displayed below to encrypt or decrypt
using AES the specified 128-bit (32 hex digit) data value with the
128/192/256-bit (32/48/64 hex digit) key, with a trace of the calculations.
Some example values which may be used are given below.

Example AES test values (taken from FIPS-197) are:

Key: 000102030405060708090a0b0c0d0e0f1011121314151617

Key: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f

Encrypting the plaintext with the key should give the ciphertext,
decrypting the ciphertext with the key should give the plaintext.



Welcome to Chip Architect



[Detail Spec.]
  • Xilinx XC3S1000 - 120 FPGAs, 136Mhz
  • 2^56 / 12.8 day = 65,156 M/sec
  • Cost : $10,000 (real cost <$100,000)

FPGA based SHA1 implementation

[1] "Optimizing SHA-1 Hash Function for High Throughput with a Partial Unrolling Study", Integrated Circuit and System Design, 2005

[2] "Throughput Optimized SHA-1 Architecture Using Unfolding Transformation", Application-specific Systems, Architectures and Processors, 2006. ASAP '06.

3,541 Mbps = 22M/sec
10.4 Gbps = 65M/sec

NSA@Home : FPGA-based SHA-1 and MD5 bruteforce cracker

NSA@home is a fast FPGA-based SHA-1 and MD5 bruteforce cracker. It is capable of searching the full 8-character keyspace (from a 64-character set) in about a day in the current configuration for 800 hashes concurrently, using about 240W of power. This performance is equivalent to over 1500 Athlon FX-60 CPUs, which would take about 250kW.

[Detail Spec.]
  • core chips : 15 Virtex-II Pro (XC2VP20) FPGAs
  • control chips : 3 Spartan-II (XC2S50) FPGAs
  • DSP : 1 ADSP21160M (which probably calculated transform parameters)
  • Size : 1u case
  • Power : about 120W while operating with 6 fans
  • Speed : 2^(6 * 8)=2^48 = 256,000,000M/day = 3000M/sec
[good choice]
  • SHA1 best 'ATI Radeon HD 4870 X2 - IGHASHGPU' : 640M/sec, 250W, $420
  • MD5 best 'ATI Radeon HD 4870 X2 - IGHASHGPU' : 2400M/sec, 250W, $420
  • SHA1/MD5 NSA@Home : 3000M/sec, 120W, $??,000
  • NSA@Home - http://nsa.unaligned.org/
  • The complete SHA-1 chip Verilog source can be found here.
  • The MD5 chip uses most of the files from the SHA-1 one, and the new hash & toplevel is here.
  • Spartan-II USB interface sources are here


Cheap and powerful MD5 Cracker

Marc Bevand, "MD5 ChosenPrefix Collisions on GPUs", Black Hat USA 2009 July 30, 2009

  • 4 Radeon HD 4870 X2 in single machine
  • 8 GPU
  • About $1500
  • Tatol of 6500 Mhash/sec (using IGHASHGOU s/w, 2400*4=9600Mhash/sec)


MD5 Software Benchmark

[Current fastest MD5 Software]
  • ATI Radeon HD 4870 X2 - IGHASHGPU : 2400M/sec


The Performance of Processors

[Current fastest processor]

  • HD 4870 X2 - 2400G Flops



SHA1 Software Benchmark

[Current fastest SHA1 Software]

  • ATI Radeon HD 4870 X2 - IGHASHGPU : 640M/sec