|Using PowerPivot to analyze web access log in "Combined Log" format (CLF)|
|Written by PowerPivot-Info.com|
|Monday, 15 November 2010 15:00|
Majority of people who have a website or publish a blog wants to know how many people are visiting your site and what are they reading. There are few free services (like Google Analytics) that provide that information, but these free services have limitation - they usually provide just "human" visitor information. That is they track visitors who accessed your site from different browsers (PC, smart-phone, etc.) and triggered Java script execution. But sometimes you need to know more - you want to know ALL visits to your site - from humans and from bots (or robots). That information is accessible just from your web service log. If your web server is running on Apache (Unix), then it is very likely that your site has access log in "common log" format or in "combined log" format. Website access log can contain thousands or even millions of records and you need a tool that would help you to read and analyze them.
This article will demonstrate how easy is to create PowerPivot application that will create multiple helpful reports from website access log stored in Combined Log Format (CLF). We assume that you have at least basic knowledge on how to do simple things in PowerPivot (like loading data, create linked table, creating relationships, etc.), so we will not provide step by step guide on how to do these things.
Combined Log Format (CLF) description
In CLF each line contains 9 fields separated by space:
Here is sample line from access log:
220.127.116.11 - - [30/Sep/2010:05:06:53 -0700] "GET /analysis-services-implementations/68-live-demos/1088-microsoft-sql-server-2008-data-mining-services-table-analysis-tools-for-the-claud HTTP/1.1" 200 29012 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
You can read more about CLF here.
Loading log data into PowerPivot
When loading web access log into PowerPivot, you should be aware of following:
As we want to have control on how web access log data is imported by PowerPivot we will use schema.ini file where we will specify each field.
So lets start loading our log file data into PowerPivot:
---- End of schema.ini file. Do not copy this line or lines below
Edit above file and replace first line "[ssas-info.txt]" with the name of your log file.
A Schema.ini file has been detected in the current import folder "<your folder location>". Settings from this file will override your current import settings.
Note: After loading this table into PowerPivot we should create relationship between our web log table and this status table on field Status.
Creating calculated PowerPivot columns
At this point we need to prepare web log access data for analysis:
To fulfill above requirements we will create PowerPivot calculated columns using following DAX:
Calculated column "App" deserves special explanation. In this column formula we are searching for different substrings in the UserAgent field and based on the fact if that substring is found we assign application name. Normally you would write following DAX formula to check if a string contains specific substring:
But we used following DAX logic to check if substring exist:
IF(SUBSTITUTE([UserAgent], "<SubstringToFind>","") <> [UserAgent],<AppNameIfFound>,<AnotherSearchIfNotFound>)
In our formula we avoided using IfError DAX function that is known to have performance issues. Performance here is very important to us, because calculation of this column is very slow. Please note that even after all performance tuning it takes about 3.5min for this column to be calculated.
As we created calculated columns, some of the base columns now can be hidden from Pivot view. We can do that using PowerPivot option Design->Hide and Unhide". Image below shows columns we can safely hide:
For table "Status" we can hide all columns "In PivotTable", as we created calculated column StatusDescr in our main web access log table. When we hide all columns in table, that table will not appear in PivotTable field list view.
At this point we should create our first Pivot table and add to our application few calculated measures that will be used in different reports
Now we are ready to create reports based on data we just loaded.
Analyzing web log data
It took some time for us to load data into PowerPivot and prepare it, but now we have full flexibility to analyze our web stats data. Here is sample dashboard that helps me understand what applications are downloading data from my website and how much (note some numbers are masked)
And here is example of report that shows request counts by http status codes:
There are many other reports that can be built from this access log and it all depends on what would you like to analyze and how good you are with carts and pivots. And just like we added Status table, for your reports you might want to include additional information that would help you analyze your web access log. For example, if you have a local store database, you might want to add weather information to your analysis to see how it affects web traffic. Or you might consider buying IP address geolocation information and incorporate that in your analysis.
PowerPivot is extremely powerful tool that can be used to analyze millions of rows of data. It allows us to build applications with a very few lines of code (DAX scripting) that can be easily expanded by adding data from other sources (like we did with status code description). Any time you are dealing with data and want to understand it better, you should consider using free tool - PowerPivot.
- PowerPivot / PowerView Excel Workbook Optimizer
- DAX Studio - Free Excel Add-in
- How to solve issue when PowerPivot for Excel crashes on all workbooks
- Using PowerPivot to analyze web access log in "Combined Log" format (CLF)
- Q: I have integer amount that represents duration in seconds. How can show this amount in format "h:mm:ss"?
- Step by step guide on installing PowerPivot for SharePoint on a single machine
- List of suggested datasets to test PowerPivot
- How to install PowerPivot for Excel and list of know issues
- List of PowerPivot DAX functions with description
- Microsoft Virtual Lab: PowerPivot for Excel 2010 Introduction
- Learning PowerPivot and DAX
- List of PowerPivot DAX functions (short)
- PowerPivot's impact on BI pros?