General question about binning data in excel

abberyfarm

Well-known Member
Joined
Aug 14, 2011
Messages
733
Hi there,

I have three columns of data, 100,000 plus rows, and I am wondering how would you go about binning the data into say 10 different categories depending on the values in the columns?

Would you use a macro?

Thanks
 
thanks, that doesn't help me though :-)

what final result/format/table would be wanted? simple assignations to bins as you've answered already or some sort of summary
 
Upvote 0

Excel Facts

Why are there 1,048,576 rows in Excel?
The Excel team increased the size of the grid in 2007. There are 2^20 rows and 2^14 columns for a total of 17 billion cells.
I don't know how to answer the question you asked, so let me respond to the one I can.

Once you've binned the data and assigned a state to each bin, then you could say this:

State 1 appears twice (ignoring the final state). The first time it changes to state 2, and the second time it remains in state 1. So in state 1, there's a 50% chance it will remain in that state, and a 50% chance it will change to state 2.

Similarly, state 3 will remain in state 3 2/3 of the time, and 1/3 of the time will change to state 4.

Once you have the transition probabilities, you can make a state machine that behaves (statistically) the same way.

So the objective of binning is to translate state vectors (speed and VSP in this example) to discrete states, so you can proceed to the next step.
 
Upvote 0
thanks, shg

if the exercise is like that, I'm thinking it would be inefficient to load a worksheet with tens of thousands of intermediate results - unless needed for auditing - and instead simply go directly to the final result (whatever that might be, I don't know)

I don't like worksheets with multiple columns each with tens of thousand of formulas :-)
 
Upvote 0
You have a reference system defined by a list of state vectors; one way or another, you need to extract a model from that.
 
Upvote 0
sure, and that can be done without filling in many tens of thousands of cells with formulas or intermediate results. regards
 
Upvote 0
Well, jump in!

A pivot table with groupings on the two state variables (one as row, one as column) would be a good start (that I can do). In the body of the table, you need the count of transitions to each state. I'm no pivot table wiz, and would be happy to see how it's done.
 
Last edited:
Upvote 0
well, I don't know the real question yet!

and I'm thinking it doesn't need to be done on a worksheet (such as a pivot table), perhaps

SQL might be OK on its own - so yes, a pivot table or maybe a query table, or ADO - or use arrays in VBA, or whatever else is best
 
Upvote 0
well, I don't know the real question yet!

1. Given a list of state vectors, bin the state variables in some fashion. (The purpose of binning is to reduce the total number of states; otherwise, with continuous variables, every vector might be a unique state).

2. Assign each vector to a state number based on those bins.

3. Create a transition table that shows the count of transitions of each state to every state.
 
Upvote 0
my reference to the real question is the question in this thread

I don't want to hijack/divert the thread - I'd like to leave this thread to the OP's question
 
Upvote 0
Hi Guys,

First of all I would like to thank you all for your help. This forum is a great resource.

Shg is correct in that I am working on a markov chain analysis and he helped me out with a method for creating transition probabilities on this thread http://www.mrexcel.com/forum/showthread.php?t=576628
Just in relation to this shg, I decided to use a different transitional state in the end, not speed. I am using change in driving mode. i.e acceleration, deceleration, idle and cruising. I adapted your method to do this.
Example
trasn.png


This thread is not directly linked to markov chain aspect of my work. It precedes it and is just to segment my data into different categories before I proceed to the markov chain part.

So the purpose of this thread is that I want to bin the data into 17 different operating mode bins according to this. FYI, VSP = vehicle specific power.
bins.png


Then I will perform a principle component analyses (using excel stat) on these bins to reduce the data dimensions and then I will cluster the data in different driving patterns based on the most significant principle components.

But at the moment I just focusing on binning the data.

Thank for the help again. Any suggestions?
 
Upvote 0

Forum statistics

Threads
1,224,598
Messages
6,179,818
Members
452,946
Latest member
JoseDavid

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top