About the meat for the BBQ

Auric

New Member
Joined
Jun 4, 2014
Messages
5
I have bought a few books about Power Pivot (Both written by Rob, Alexander, Ferrari, Russo) and I have to say they all lack a serious, extensive discussion on the subject of how to prepare the information on which the PP analysis will be based (Very little on this). We are always talking about 000s, millions of rows and what have you.....but those million rows about the info I need are not always available..... Say for a minute that the info is not as neatly prepared as I would wish.....What then??

Say I want to analyze murders occurring in a certain city, but I only have bits and pieces....How is it recommended to approach the problem? Shall I simply Google the info, use Power Query, etc. I would really like to read plenty about these kind of experience which ALL of us have had...

And that is my complaint....all these books about the analysis' potential of PP, which include very little about how to put together the info, is like talking about how to grill a perfect BBQ and prepare super sauces.....without having the meat to grill! I just want to know how experienced PP users have approach this problem..
 

Excel Facts

How to find 2nd largest value in a column?
MAX finds the largest value. =LARGE(A:A,2) will find the second largest. =SMALL(A:A,3) will find the third smallest
Amusing subject line, I totally assumed this post would be spam :)

I think the problem is that it is such a big "it depends". Both on what the source of your data is... and what tools you have at your disposal. Somebody with an IT dept running Oracle or SQL Server is different than a solo consultant with... notepad.exe.

Typically when folks want to analyze data in Power Pivot... they HAVE the data. And one of the great things about Power Pivot is that it *can* pull the data from multiple sources. If your murder data is in a CSV... you are pulling that into power pivot and get to work analyzing! If it needs some "massaging", then maybe Power Query is great for you, but maybe you instead need to write an SSIS package to pull it into a SQL Server... it depends :)

I can tell you that whenever the data is in SQL, I am very, very happy.
 
Upvote 0
Hi scottsen, and thanks for you input!

Indeed, when you have the info....well, you have it and proceed to analyze it with PP...... Say you are a consultant Co and your client asks you to analyze certain info which they provide, in order to determine in which part of town they should open their first alligator meat hamburgers joint....well, you just do it using PP (and charge them accordingly:cool:); end of story. But think for a minute now that the same client now asks you to do the same job for him, with the difference that they don't have the info, they don't know what info is pertinent and they don't know where to look for it.....oh yeah, and they are ready to pay you a bundle for the job... Do you tell them, get lost and come back when you have all the info neatly prepared, exactly how I need it? Or, you use your creative powers to put together the info and then use PP to analyze it?

I realize some may think that this is a non-issue because they are accustomed to have the info available, but that is in NO way always the case by far. Once again, I just want to hear from experienced PP users who have encountered this problem.....How have they approached it?
 
Last edited by a moderator:
Upvote 0
Well, if its boat loads of money, the answer is "call your buddy Scott" :)

I just don't see how you are getting away from "it depends". I have had clients with data in old legacy databases not supported by Power Pivot. I have had clients with data in the cloud through only available through custom api's. Or in SQL but with 500 column wide tables (which Power Pivot really hates). The solution is going to be different depending on where the data lives and how you can access it.

If the data doesn't even EXIST, and you are developing a solution to gather and store the data... well, certainly not specifically a Power Pivot problem, but if at all possible, I will be hoping the data ends up in SQL. You are still going to have interesting questions around server location (cloud? in house?), how the data gets refreshed (both the SQL and your power pivot workbooks).

The (somewhat sad, imo) reality is that when it comes to actually delivering the workbooks, Sharepoint is probably the best option. And the existence of a sharepoint farm is going to influence things as well (if it's in house, it implies access to a SQL server, if its O365 it is going to constrain your data refresh options).
 
Upvote 0
Wait, wait! you just addressed the conundrum describing the type of problem I refer to by listing some of the issues you have confronted.

I beg to differ when you state that non extant data is not a PP problem. Actually I think it very much part of the task at hand, especially if you are pro active and offer a comprehensive service.

Regarding Sharepoint, personally, I don't think it is so vital to use it. To me, the "refreshing" element is probably the most important feature, but again, I think you can deliver the goods without using Sharepoint. I have performed quantitative risk analysis for over a decade now, and while there are options similar to Sharepoint, I never felt I absolutely needed them to present and deliver the work in a suitable fashion.

Furthermore, I do agree that the "it depends" holds but at the same time it is too broad to shed light into the problem. You write: "The solution is going to be different depending on where the data lives and how you can access it" Could you share with me the specific procedure you followed to successfully prepare the data for a specific client when the data was "not neatly arranged", before you could carry out the PP analysis?
Thanks
 
Upvote 0
There is a pretty typical scenario where you have a large power pivot model, and build a bunch of reports around it. Maybe some filtered... so the North West Sale Manager is only seeing... northwest sales. That gets really un-fun without Sharepoint to server up html to your sales people. Certainly there are other scenario where Sharepoint is not a requirement... but still nice, if only to have a single version of "the truth" (oh, you are looking at the report from tuesday? no silly, you need the thursday one...)

3 examples where you might want to massage the data before it gets to power pivot:

1. Creating a calendar table: They frequently have a bunch of extra calculated columns. Yes, you can typically do that in power pivot as well, but then you are duplicating that work for each report. Also, calculated columns don't perform as well (I believe they are basically NOT compressed like typical columns). In general, its always "better" to add calculated columns in SQL, compared to Power Pivot.

2. Pivoting (or is it unpivoting?) data: Say I have 8 columns of numbers... uh, sales in 8 different currencies. Typically it is better to instead of a single "Amount" column, and add a "currency type" column. This is because Power Pivot works "better" with more rows, less columns. Just for performance reasons.

3. Aggregating Data: Say you have daily sales, but your company never looks at reports to that detail level. Weekly (or monthly) is adequate. In that case, it is rather nice to do the aggregations in SQL, and just pull weekly/monthly data into Power Pivot -- again, just for performance reasons.

You will notice all 3 of these are... sorta assuming the data lives in SQL, cuz that is the best place for it :)
 
Upvote 0
Thanks for sharing!

One last question: Should the data not be in an SQL and providing for different scenarios.... what would be your general approach in one or two specific cases?

In order to get wiser, I will continue asking others...Where is the beef:confused:?
 
Upvote 0
Can I say... "it depends" ? :)

A recent client had data in the cloud at various sites (AppAnnie, ExactTarget, google analytics, etc). I wrote .net code to pull the data, then output it as either CSV... or directly to SQL. Obviously the code to pull from the services was very "it depends", and the 2 output formats gave a reasonable amount of flexability. In these cases the only interesting decision point was at what level to aggregate the data ("daily" for this customer).

But I'm not sure this really gets to the heart of your question since the data already existed... somewhere. I actually can't think of a recent situation where somebody wanted my assistance gathering the data -- that's not really my space currently.

I was a developer at Microsoft for many years, and certainly there were interesting decisions around reporting health/status of various systems... storing it somewhere then later analyzing that data. But uh... it was Microsoft, there were more SQL servers than you could shake a stick at :) And certainly in those cases, people are very NOT worried about having the best format the data for power pivot consumption -- if required, that is done in some kinda of nightly processing job.
 
Upvote 0
Thanks again.

You have discussed something which in my judgment is very important, but it seems that it is shunned, as it should be a given that the information is there, very neatly arranged, exactly how I need it. Regretfully, it is by far NOT always the case.

Using the same analogy, it would be like inviting friends to a BBQ to who you lecture about all you can do, tricks, how to prepare an evil sauce, in order to grill the perfect BBQ. And suddenly, one of your friends -already hungry- timidly asks you.....where is the meat? Well, you tell him that that is a non-issue question because the meat should be there....and if not, then I will kick everyone out of my garden.... :crash:because according to me, the point is not to have the meat ready, cut, marinated and ready to grill, but to educate everyone about how to prepare the perfect BBQ...

I found more on this important subject reading your comments than in ANY of the 4-5 books I have about PP.!
 
Upvote 0

Forum statistics

Threads
1,224,008
Messages
6,175,918
Members
452,684
Latest member
RRaively1

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top