Using PowerPivot to Analyse Windows Search Results
Reposted from Chris Webb's blog with the author's permission.
My whole life is on my laptop hard drive, and as a consequence my laptop hard drive has a lot of stuff on it. Luckily, when I need to find something, there’s always Windows Search to help me out - but even so a single search query can return a lot of data. How can we make sense of all the data in the Windows Search Index? Well, we can use PowerPivot of course!
I first got the idea for doing this when I saw ‘Microsoft OLE DB Provider for Search’ in my list of OLE DB Providers; a quick look around on the net revealed the following useful sources of information about it:
http://msdn.microsoft.com/en-us/library/bb231256(v=VS.85).aspx
http://www.thejoyofcode.com/Using_Windows_Search_in_your_applications.aspx
So if you can query Windows Search using SQL, I thought, then I should be able to take the data that is returned from running one of these SQL queries and load it into PowerPivot. And after a lot of trial and error, I managed it – and it works rather well. Here’s how to do it…
First of all, you need to make sure you have the OLE DB Provider for Search installed. If you don’t, you need to download and install the Windows SDK. Then you can open up a new Excel workbook and open the PowerPivot window. Next you need to create a new connection by going to the Home tab, clicking the Get External Data/From Other Sources button, and then clicking the Others(OLEDB/ODBC) option.
Next, type in the following connection string:
Provider=Search.CollatorDSO;Extended Properties=Application=Windows
Do not try to click the Build button and select the provider from the list – when I did this, I got the error “Provider is no longer available. Ensure that the provider is installed properly”. Next, choose the “Write a query to specify the data to import” option (again, if you choose the “Select from a list of tables…” you’ll get an error) and enter your SQL query.
Here’s where the fun begins. From the two links above, you can see that there are loads of possibilities as to what you can query. Here’s a sample query that returns a list of all the files in the Documents folder on my c: drive and below, along with their file types, the folder path and the file size in bytes:
SELECT
System.ItemTypeText,
System.Size, System.ItemFolderNameDisplay,
CAST(System.ItemFolderPathDisplay as DBTYPE_BSTR),
System.DateCreated,
System.DateAccessed,
System.FileExtension,
CAST(System.ItemName AS DBTYPE_BSTR)
FROM SYSTEMINDEX
WHERE SCOPE='file:C:UsersChris WebbDocuments'
Notice that I’ve had to cast some columns to DBTYPE_BSTR – I found that if I didn’t do this, the columns simply didn’t appear in the query results in PowerPivot. Other things you can do here in the query include searching for all items that include particular words or phrases, or which are above a certain size, or have a particular file extension.
With that done, you’re good to go. In a pivot table you can slice and dice all the data returned to your heart’s content. Here, for example, are the top five files with the .ABF extension (ie SSAS backup files) from my c: drive:
With a separate time dimension table joining to System.DateCreated you can do even more. Here’s the total size of files on my c: drive in bytes broken down by the year they were created:
You can also use the DAX time intelligence functionality. I added a running total calculation that shows the growth in the total size in MB of all files, over time, based on the creation date of each file. Here’s the formula:
=CALCULATE(SUM(Files[SYSTEMSIZE]), DATESBETWEEN(Time[Date], BLANK(), LASTDATE(Time[Date])))/1024/1024
This chart shows that running sum from November 2008, when I bought the laptop, to today:
There are plenty of tools out there that help you analyse this type of data but I doubt any of them can do what the PowerPivot/Excel combo can do. And it’s this kind of personal BI that PowerPivot makes easy. The only thing missing is an API which would allow you to build the SQL query used here dynamically: imagine having an interface where users could type their own search terms and then be able to analyse the results in PowerPivot at the click of a button. Hopefully PowerPivot will get an API soon. And as I’ve said before in the past, wouldn’t it be cool if Bing could do this kind of thing with web search results and data found on the web?
Chris has been working with Microsoft BI tools since he started using beta 3 of OLAP Services back in the late 90s. Since then he has worked with Analysis Services in a number of roles (including three years spent with Microsoft Consulting Services) and he is now an independent consultant specialising in complex MDX, Analysis Services cube design and Analysis Services query performance problems. His company website can be found at http://www.crossjoin.co.uk and his blog can be found at http://cwebbbi.spaces.live.com/ . |