February 06, 2011

Building the ultimate bad arse CUDA cracking server...

Written by David Kennedy

Penetration Testing Security Testing & Analysis

If you followed my blog post about a year ago , me and another one of my friends Josh Kelley (twitter: @winfang98) built a CUDA cracking server that consisted of an ASROCK overclocker motherboard and 4 GTX 295's which was a nice accomplishment building it from scratch. This time around Josh and I wanted to outdo ourselves and built something a little more crazy. I have to caveat this though, we cheated a little bit and didn't build this ourselves due to the massive amounts of customizations needed on the motherboard, we decided to use a company called RenderStream which Martin Bos (@purehate_) recommended. Have to say special thanks to Pure_Hate for all his help on this, he is the CUDA god :) cuda cracking server

We still have some work to do with adding two Magma expansion slots and adding up to 10 new cards (coming soon). On to the machines specifications, it's a 4U enclosure with x3 1200W power supplies (208V) with 8 Nvidia GTX 580's with 512 cores (total of 4096 cores) and 1.5GB DDR5. This is a total of 12 teraflops in graphic processing. It's using the Intel X5677 4core processor and a custom Renderstream VDACTr8 motherboard for the x8 - 2 wide PCI-E slots. It's got 12 gigs of 1333mhz DDR3. An Intel X25-E 64GB solid state drive, and a Seagate 500GB 7200RPM hard-drive. For now it's running Ubuntu 10.10 x64 Maverick. We figured raid configurations, etc. weren't necessary as redundancy isn't a major issue. cuda cracking server

We've already got 4 GTX 295's from the old server, so a total of 8 GTX 580's and a 4 GTX 295's to start off with once the magma expansion slots are in. cuda cracking 2

From a pure performance perspective in cracking NTLM/MD5's, we haven't fully finished the fine tuning but here's what we got so far. This is using the latest Multiforcer 0.80 Alpha4 which now supports multiple GPU's. One thing I'll say is this is pretty buggy so use at your own risk but we averaged around 22.1 billion passwords per second on 8 GTX 590's: 0: GPU: 2998.01M/s 1: GPU: 2996.78M/s 2: GPU: 2997.31M/s 3: GPU: 2996.64M/s 4: GPU: 2633.48M/s 5: GPU: 2567.71M/s 6: GPU: 2355.49M/s 7: GPU: 2568.05M/s TOTAL : 22113.47M/s Looking at the above, it looks like we're not getting full performance from that last 4 GPU's as much as the first four which seems to be a limitation of the Alpha version of Multiforcer. Using oclHashCat on MD5's we averaged around 14.2 billion per second below: root@newcuda:/opt/oclHashcat-0.25# ./cudaHashcat64.bin example.hash -n 160 -1 ?l?u?s?d ?1?1?1?1 ?1?1 cudaHashcat v0.25 starting... Digests: 6494 entries, 6494 unique Bitmaps: 16 bits, 65536 entries, 0x0000ffff mask, 262144 bytes Platform: NVidia compatible platform found Device #1: GeForce GTX 580, 1535MB, 1564Mhz, 16MCU Device #2: GeForce GTX 580, 1535MB, 1564Mhz, 16MCU Device #3: GeForce GTX 580, 1535MB, 1564Mhz, 16MCU Device #4: GeForce GTX 580, 1535MB, 1564Mhz, 16MCU Device #5: GeForce GTX 580, 1535MB, 1564Mhz, 16MCU Device #6: GeForce GTX 580, 1535MB, 1564Mhz, 16MCU Device #7: GeForce GTX 580, 1535MB, 1564Mhz, 16MCU Device #8: GeForce GTX 580, 1535MB, 1564Mhz, 16MCU Device #1: Kernel ./kernels/4318/m0000.sm_20.64.cubin Device #2: Kernel ./kernels/4318/m0000.sm_20.64.cubin Device #3: Kernel ./kernels/4318/m0000.sm_20.64.cubin Device #4: Kernel ./kernels/4318/m0000.sm_20.64.cubin Device #5: Kernel ./kernels/4318/m0000.sm_20.64.cubin Device #6: Kernel ./kernels/4318/m0000.sm_20.64.cubin Device #7: Kernel ./kernels/4318/m0000.sm_20.64.cubin Device #8: Kernel ./kernels/4318/m0000.sm_20.64.cubin [fusion_builder_container hundred_percent="yes" overflow="visible"][fusion_builder_row][fusion_builder_column type="1_1" background_position="left top" background_color="" border_size="" border_color="" border_style="solid" spacing="yes" background_image="" background_repeat="no-repeat" padding="" margin_top="0px" margin_bottom="0px" class="" id="" animation_type="" animation_speed="0.3" animation_direction="left" hide_on_mobile="no" center_content="no" min_height="none"][s]tatus [p]ause [r]esume [h]elp [q]uit => s Status....: Running Mode.Left.: Mask '?1?1?1?1' (81450625) Mode.Right: Mask '?1?1' (9025) Speed.GPU*: 14.1G/s Recovered.: 0/6494 Digests, 0/1 Salts Progress..: 5368709120/735091890625 (0.73%) Running...: 0 secs Estimated.: 1 min, 5 secs 5e5828c277f461d6dc3fbb1eac4e2fa6:99saso b27800b6b7115fb4b7a5ce271ec0a8b2:E.Isso [s]tatus [p]ause [r]esume [h]elp [q]uit => s Status....: Running Mode.Left.: Mask '?1?1?1?1' (81450625) Mode.Right: Mask '?1?1' (9025) Speed.GPU*: 14.2G/s Recovered.: 2/6494 Digests, 0/1 Salts Progress..: 27514634240/735091890625 (3.74%) Running...: 2 secs Estimated.: 1 min, 3 secs 992fd66b4543e0ac23f3193d51b746ee:qmmnsc 467976bdd5c0234978aabbd21f303e5c:nzutus 3e2dfae2aa872916340eb22d9e79f98c:derby_ 7c2826cc67833595ee6bfcf064daf21e:save60 1c8f19ba41b153a0b3d6fac48d4e5f5e:30re50 fcc3304d02b1ae024a064f8da62c736c:zone50 e5ce937a7ae5ae9d5783687fe5ce4fe8:jade40 39f0766d6ca8c92b43d1bb7f20963e1b:p3nk12 5276f23962da83c8a2fb58e4a722f6fc:rozee5 8c33a8c9d760f50d3fbd09928a806b0e:h8ww78 6b576027375ad4912263e95319f351bd:giofio 4d569a18c6fa26bbe351e6b2ac209e7b:keribu 06cd562bfbe200ff0bc5791592706d71:lopiaz af6f5ab83cb39d2433b12c7eec98d859:willy$ fef4240115cd0b68ce7f8d76fea7e8c8:deni-- a54b35ab1abf1ae437dad96fb0f22c4b:tomi10 1459ccf0940e63051d5a875a88acfaaf:pigi00 7becb9424f38abff581f6f2a82ff436a:sail00 93874433619186c89a1ec8f41600c5be:saml10 c9944c427745e47c3d53ed63a70658d1:lilla3 844016a5a4e54517f9fa041fffafbdc8:blux13 562c968651f2ce9c3a5f45d6acf69357:kimi73 3737c01191dc66b6f756ba265b6b706b:minu69 9ddbe0017d767560ed0fba20f63958e4:lucdea 820079d5ccdf6e725f612b72c404fa4d:456ira 07c1252862faff7a560ba64cbc945fe5:aleyle [s]tatus [p]ause [r]esume [h]elp [q]uit => s Status....: Running Mode.Left.: Mask '?1?1?1?1' (81450625) Mode.Right: Mask '?1?1' (9025) Speed.GPU*: 14.1G/s Recovered.: 28/6494 Digests, 0/1 Salts Progress..: 193965588480/735091890625 (26.39%) Running...: 17 secs Estimated.: 48 secs [s]tatus [p]ause [r]esume [h]elp [q]uit => So far so good, after killing two 15 amp power surges we upped it to x3 30 amp surge protectors and haven't blown one yet :) More to come as we keep expanding the overall system and keep tweaking as we go. The major limitations seem to be the software at this point, there are some Windows alternatives that would get us in the upward of 33-38 billion per second with our current configuration that we haven't tested yet.[/fusion_builder_column][/fusion_builder_row][/fusion_builder_container]

Solutions

Services

About Us

Building the ultimate bad arse CUDA cracking server...