February 06, 2011
Building the ultimate bad arse CUDA cracking server...
Written by
David Kennedy
Penetration Testing
Security Testing & Analysis
If you followed my blog post about a year ago , me and another one of my friends Josh Kelley (twitter: @winfang98) built a CUDA cracking server that consisted of an ASROCK overclocker motherboard and 4 GTX 295's which was a nice accomplishment building it from scratch. This time around Josh and I wanted to outdo ourselves and built something a little more crazy. I have to caveat this though, we cheated a little bit and didn't build this ourselves due to the massive amounts of customizations needed on the motherboard, we decided to use a company called RenderStream which Martin Bos (@purehate_) recommended. Have to say special thanks to Pure_Hate for all his help on this, he is the CUDA god :)
We still have some work to do with adding two Magma expansion slots and adding up to 10 new cards (coming soon). On to the machines specifications, it's a 4U enclosure with x3 1200W power supplies (208V) with 8 Nvidia GTX 580's with 512 cores (total of 4096 cores) and 1.5GB DDR5. This is a total of 12 teraflops in graphic processing. It's using the Intel X5677 4core processor and a custom Renderstream VDACTr8 motherboard for the x8 - 2 wide PCI-E slots. It's got 12 gigs of 1333mhz DDR3. An Intel X25-E 64GB solid state drive, and a Seagate 500GB 7200RPM hard-drive. For now it's running Ubuntu 10.10 x64 Maverick. We figured raid configurations, etc. weren't necessary as redundancy isn't a major issue.
We've already got 4 GTX 295's from the old server, so a total of 8 GTX 580's and a 4 GTX 295's to start off with once the magma expansion slots are in.
From a pure performance perspective in cracking NTLM/MD5's, we haven't fully finished the fine tuning but here's what we got so far. This is using the latest Multiforcer 0.80 Alpha4 which now supports multiple GPU's. One thing I'll say is this is pretty buggy so use at your own risk but we averaged around 22.1 billion passwords per second on 8 GTX 590's:
0: GPU: 2998.01M/s
1: GPU: 2996.78M/s
2: GPU: 2997.31M/s
3: GPU: 2996.64M/s
4: GPU: 2633.48M/s
5: GPU: 2567.71M/s
6: GPU: 2355.49M/s
7: GPU: 2568.05M/s
TOTAL : 22113.47M/s
Looking at the above, it looks like we're not getting full performance from that last 4 GPU's as much as the first four which seems to be a limitation of the Alpha version of Multiforcer.
Using oclHashCat on MD5's we averaged around 14.2 billion per second below:
root@newcuda:/opt/oclHashcat-0.25# ./cudaHashcat64.bin example.hash -n 160 -1 ?l?u?s?d ?1?1?1?1 ?1?1
cudaHashcat v0.25 starting...
Digests: 6494 entries, 6494 unique
Bitmaps: 16 bits, 65536 entries, 0x0000ffff mask, 262144 bytes
Platform: NVidia compatible platform found
Device #1: GeForce GTX 580, 1535MB, 1564Mhz, 16MCU
Device #2: GeForce GTX 580, 1535MB, 1564Mhz, 16MCU
Device #3: GeForce GTX 580, 1535MB, 1564Mhz, 16MCU
Device #4: GeForce GTX 580, 1535MB, 1564Mhz, 16MCU
Device #5: GeForce GTX 580, 1535MB, 1564Mhz, 16MCU
Device #6: GeForce GTX 580, 1535MB, 1564Mhz, 16MCU
Device #7: GeForce GTX 580, 1535MB, 1564Mhz, 16MCU
Device #8: GeForce GTX 580, 1535MB, 1564Mhz, 16MCU
Device #1: Kernel ./kernels/4318/m0000.sm_20.64.cubin
Device #2: Kernel ./kernels/4318/m0000.sm_20.64.cubin
Device #3: Kernel ./kernels/4318/m0000.sm_20.64.cubin
Device #4: Kernel ./kernels/4318/m0000.sm_20.64.cubin
Device #5: Kernel ./kernels/4318/m0000.sm_20.64.cubin
Device #6: Kernel ./kernels/4318/m0000.sm_20.64.cubin
Device #7: Kernel ./kernels/4318/m0000.sm_20.64.cubin
Device #8: Kernel ./kernels/4318/m0000.sm_20.64.cubin
[fusion_builder_container hundred_percent="yes" overflow="visible"][fusion_builder_row][fusion_builder_column type="1_1" background_position="left top" background_color="" border_size="" border_color="" border_style="solid" spacing="yes" background_image="" background_repeat="no-repeat" padding="" margin_top="0px" margin_bottom="0px" class="" id="" animation_type="" animation_speed="0.3" animation_direction="left" hide_on_mobile="no" center_content="no" min_height="none"][s]tatus [p]ause [r]esume [h]elp [q]uit => s
Status....: Running
Mode.Left.: Mask '?1?1?1?1' (81450625)
Mode.Right: Mask '?1?1' (9025)
Speed.GPU*: 14.1G/s
Recovered.: 0/6494 Digests, 0/1 Salts
Progress..: 5368709120/735091890625 (0.73%)
Running...: 0 secs
Estimated.: 1 min, 5 secs
5e5828c277f461d6dc3fbb1eac4e2fa6:99saso
b27800b6b7115fb4b7a5ce271ec0a8b2:E.Isso
[s]tatus [p]ause [r]esume [h]elp [q]uit => s
Status....: Running
Mode.Left.: Mask '?1?1?1?1' (81450625)
Mode.Right: Mask '?1?1' (9025)
Speed.GPU*: 14.2G/s
Recovered.: 2/6494 Digests, 0/1 Salts
Progress..: 27514634240/735091890625 (3.74%)
Running...: 2 secs
Estimated.: 1 min, 3 secs
992fd66b4543e0ac23f3193d51b746ee:qmmnsc
467976bdd5c0234978aabbd21f303e5c:nzutus
3e2dfae2aa872916340eb22d9e79f98c:derby_
7c2826cc67833595ee6bfcf064daf21e:save60
1c8f19ba41b153a0b3d6fac48d4e5f5e:30re50
fcc3304d02b1ae024a064f8da62c736c:zone50
e5ce937a7ae5ae9d5783687fe5ce4fe8:jade40
39f0766d6ca8c92b43d1bb7f20963e1b:p3nk12
5276f23962da83c8a2fb58e4a722f6fc:rozee5
8c33a8c9d760f50d3fbd09928a806b0e:h8ww78
6b576027375ad4912263e95319f351bd:giofio
4d569a18c6fa26bbe351e6b2ac209e7b:keribu
06cd562bfbe200ff0bc5791592706d71:lopiaz
af6f5ab83cb39d2433b12c7eec98d859:willy$
fef4240115cd0b68ce7f8d76fea7e8c8:deni--
a54b35ab1abf1ae437dad96fb0f22c4b:tomi10
1459ccf0940e63051d5a875a88acfaaf:pigi00
7becb9424f38abff581f6f2a82ff436a:sail00
93874433619186c89a1ec8f41600c5be:saml10
c9944c427745e47c3d53ed63a70658d1:lilla3
844016a5a4e54517f9fa041fffafbdc8:blux13
562c968651f2ce9c3a5f45d6acf69357:kimi73
3737c01191dc66b6f756ba265b6b706b:minu69
9ddbe0017d767560ed0fba20f63958e4:lucdea
820079d5ccdf6e725f612b72c404fa4d:456ira
07c1252862faff7a560ba64cbc945fe5:aleyle
[s]tatus [p]ause [r]esume [h]elp [q]uit => s
Status....: Running
Mode.Left.: Mask '?1?1?1?1' (81450625)
Mode.Right: Mask '?1?1' (9025)
Speed.GPU*: 14.1G/s
Recovered.: 28/6494 Digests, 0/1 Salts
Progress..: 193965588480/735091890625 (26.39%)
Running...: 17 secs
Estimated.: 48 secs
[s]tatus [p]ause [r]esume [h]elp [q]uit =>
So far so good, after killing two 15 amp power surges we upped it to x3 30 amp surge protectors and haven't blown one yet :) More to come as we keep expanding the overall system and keep tweaking as we go. The major limitations seem to be the software at this point, there are some Windows alternatives that would get us in the upward of 33-38 billion per second with our current configuration that we haven't tested yet.[/fusion_builder_column][/fusion_builder_row][/fusion_builder_container]