Proxy + SQL + Apache + NTLM = User Web History

Posted by mkeadle

ABOUT

Now that NTLM authentication is seemlessly working on the proxy, it seemed time to take another look at mod_ntlm for Apache2 and what, if anything, it could be useful for. This first idea relates back to the proxy and is more of a proof-of-concept than anything else since I’m not so sure this thing has any real end-user value. This project relies on Squid working with NTLM, so if you don’t have that in place you won’t get very far here.

SQUID TO SQL

The first thing to do is start logging proxy traffic into an SQL database. I wrote a small service to do this that tails Squid’s access.log, parses it for interesting fields, and finally shoves it all into a DB. Originally enough, it’s a PHP script called squid2sql. If the script were to run without limitation, you’d end up with a database that was huge and would take time to query, so there’s also a need for a system to limit the amount of data present. For that, another small PHP script called stale_proxy_sql runs as a nightly cron job and deletes any records older than it’s internal variable window, currently set to 2 weeks. So with two small scripts, we’ve got a tailing 2 week log of proxy traffic in SQL.

Apache and NTLM

Here’s where the NTLM authentication comes in. Head over to the mod_ntlm homepage and grab a copy. Gentoo doesn’t currently have an ebuild for mod_ntlm, so I had to install by hand, but there may be RPMs floating around somewhere if that’s your game. In order for mod_ntlm to work, you’ll need to have the server running Apache joined to the domain. The process used in Squid + NTLM can be used for that.

The neat thing about mod_ntlm is how easy it is to setup once it’s installed. All mod_ntlm functionality is controlled through .htaccess files (link for Apache2). Here’s a copy of the mod_ntlm enabled .htaccess I’m using. Once in place, users will be auto-authed by NTLM and you’ll be able to grab their username from the REMOTE_USER environment variable.

GUI INTERFACE

Raw data and NTLM authentication are done. Now we need an interface since that was the main reason for starting this whole thing. All this requires is to kick into cruise control and whip up some quick DB queries with some more PHP glue. Heck, I’ll even (try to) make it look half respectable:

Listing 1. Webtrack Screenshots (click for larger images)

Initial view

Sorted by date

Some URLs in the screenshots have been blurred out to protect some internal hostnames/IPs.

I created a virtual host in Apache so the only thing users have to type in their browsers is “bookmarks” and they’ll see their proxy history for the past two weeks, with no manual authentication! I would post the PHP that drives the web interface, but… I don’t quite feel like it. It currently executes a SQL SELECT for each sort type and needs to be rewritten to feed a single SELECT into an array that can be sorted on it’s own. If you’re interested in the code as it is I can make it available.

CONCLUSION

That’s about it. So is it useful? I’m not so sure. It seemed like a neat idea and didn’t take any time to try out, so as an exercises in NTLM web authentication it proved useful. Before this could be released to general users there would be a few things to consider. At the moment, the Apache server and proxy server are different machines, so database queries are going across the wire. This could be sped up by using Apache dirrectly off the proxy, but if the proxy is in a DMZ it won’t be able to join the Windows domain. Also, my original idea was to use a 5 week window of traffic (an entire term), but some dirty calculations showed that could produce up to 2 million DB entries and query times would not be acceptable (at least with our current hardware). With a 2 week window the DB is large but managable, but if many people hit it at once I’m not sure how it would react. More importantly, I’m not interested in SQL queries stealing that much processor time away from Squid. In the end this will probably just get folded into MisAdmin, the general purpose IT tool I’m constantly working on, as a way to track and research web usage by all network users, whatever that’s good for.

Outside if this specific use, there are still a lot more possibilities with mod_ntlm and restricting access to Apache hosted sites to valid domain users. Maybe the biggest drawback to mod_ntlm is it’s inability to lookup group membership information, but it is able to grant/deny access on a per-user basis. Of course, if you’re primarily using IIS, restricing access to domain users is a piece of cake, you just have to suffer through using IIS.

Fin.

Comments are closed.