Exploring the Microsoft/NASA Day of Data Dataset from Day of Data
Today (November 1, 2020) marks 20 continuous years of having a human on the International Space Station (ISS). For today's video, I explore the NASA data set to see: (a) Has there really been someone on the ISS every day? (b) What is the most people aboard the ISS? (c) What is the fewest number of people aboard? (d) What is the most common crew size? Along the way, you will see some Power Query, Date.AddDays, the FILTER function, and more.
Today (November 1, 2020) marks 20 continuous years of having a human on the International Space Station (ISS). For today's video, I explore the NASA data set to see: (a) Has there really been someone on the ISS every day? (b) What is the most people aboard the ISS? (c) What is the fewest number of people aboard? (d) What is the most common crew size? Along the way, you will see some Power Query, Date.AddDays, the FILTER function, and more.
Transcript of the video:
Learn Excel from MrExcel podcast, episode 2377.
Twenty years in space.
Hey, welcome back to the MrExcel netcast. I am Bill Jelen.
Today, Sunday November 1st 2020: Microsoft and their day of data promotions says that they are celebrating 20 years of human beings living and working aboard the ISS. 20 consecutive years with someone there every day.
They have this great free data set and even a little data visualization contest.
Although it's due on November 2, 2020.
But you want to go download this awesome BaseData.xlsx data set. It's just a great data set that I haven't seen.
It's based on some NASA data. Microsoft cleaned it up a little bit.
It struck me that would be interesting to see if there have really been someone on the ISS every single day since November 1st of 2000. Just check this out.
This is actually one of the questions in the data visualization challenge.
I think it's probably - at least the way that I did it - is one of the tougher ways.
I'd be interested if you have a much easier way to go.
In that base data data set there's something called Mission by Crew.
That says who it was, what spacecraft they launched on, spacecraft they returned on. Did they change the craft? How long were they up?
The launch date and the return date and then what space station did they visit.
Alright now the first thing when I when I ran this.
I came up with a whole bunch of days in this year when there was no one on the station.
And that didn't make sense.
I realized that what we have to do is we first have to go fix something in their data. So look for all of the ISS records.
And then where the mission result does not say Return but says Active.
Active and in that data that you download the return date is blank for those three.
Because, well, they don't know when they're going to return.
Actually NASA probably knows when they will return.
And what I did is I took these three cells and changed the return date to tomorrow =NOW() + 1 So in other words they're they're still up there.
Alright and with that one change then I was able to use this data set here Mission by Crew to figure out how many people were there on every single day.
Now a little bit of orbital mechanics here.
Generally, generally when they launch they'll arrive at the ISS the next day.
It is possible to arrive the same day but if there's a scrub because of weather, Then they have to wait 20 or 30 days.
With the usual approach of orbiting the earth and then catching up to the ISS you generally get there the next day.
And that allows if there's a scrub - if they can't launch today, they'll be able to launch tomorrow.
Maybe like 20 minutes earlier or something like that.
Right so my assumption here is that they're going to arrive at the ISS the day after they launch.
And they're going to leave the ISS the day before they actually return to Earth. The flight back to Earth is a little bit quicker.
But those are just going to be my assumptions straight away.
And in this data set it has all kinds of flights whether they're going to the ISS or not.
So the first thing I did was I went into Data, From Table or Range, and created a little Power Query here.
And the first thing I did was I filtered the space station to only say ISS.
So in other words if they didn't go to the ISS I don't want those records.
I removed a whole bunch of columns that I don't really need.
And I came down to who was the the crew member, when did they launch, and when they returned.
Now these records are only flights to the ISS and back.
And then I added a new column here called Arrive Date that is the launch date plus one.
I used Date dot Add Days for that.
That would have been so much easier in Excel just equal B2 plus one. But Date.AddDays and then the Depart Date is Data.AddDays with negative one.
And then remove those extra columns so I just have Crew Name, Arrive Date, and Depart Date.
Close and Load and I end up with this data set here that gives me Crew, Arrive Date, and Depart Date.
A simple little MIN function here tells me the earliest date that someone arrived at the ISS And then =TODAY() gave me today's date.
That's eight thousand and three days in total from the earliest date to the latest date.
So a SEQUENCE function here. 8003 rows, 1 column. Starting at 12/5.
And I formatted that as a date. Alright.
And then the tough part: how many people were there and who were they?
Now, if this was Friday and it was a Dueling Excel podcast, I know that Mike would come up with some awesome formula and I would of course just default to VBA create some huge data set and a pivot table.
But I decided to try and use some of the new functions that we have.
So, here I use the FILTER function. Filtering on crew name.
Looking for is the arrive date less than or equal to this day?
Is the depart date greater than equal to that day? And send that whole thing into a TEXTJOIN.
So it gave me a list of who the people were.
The weird thing though is right here on the 16th after this departed on the 15th, We had some people.
Then a whole bunch of days where no one was there right So Six people flew up to the ISS.
They were there for about 10 or 11 days. And then no one for a whole bunch of days.
And then another crew arrived and then no one and another crew arrived.
And those #N/A's that was causing some problems because I was getting a #CALC! error.
The filter was returning a #CALC!. I used IFERROR to return N/A.
This time I use the same FITLER function and send it into COUNT to count how many items we have.
Now this time because it is a numeric column the #CALC! error is not being counted by COUNT.
So that worked out great.
And so if we kind of cruise forward here right up to November 1st of 2000.
Right there. November 1st of 2000.
From that date forward I am going to do Ctrl+Shift+Down arrow. And look down here in the lower right hand corner.
The minimum number of people on any day in the last 20 years is two. The max is 13.
It's pretty crazy that they have 13 people.
Let's see if we can find those instances of when there were 13. I will use the filter here. 13 people.
Alright so wow. Okay. So April 29, 2001.
It looks like a space shuttle arrived and then two days later someone took off. So there were 13 there just for a couple of days.
On this day -- March 27th, there were 13.
Wow but then there was a whole big string here for several days it looks like almost a month.
And then half a month where they had 13 people on the ISS. I wonder if that was crowded or not?
And then how about the loneliest time on the ISS? If we go here to filter.
Get rid of everything. But there's never just a single person.
There was always at least two. Wow lots of times that there were two people.
Probably one on the Russian module and one on the US side at a minimum, I would guess.
I don't know that for sure.
And then we'd have to do some other analyses with some XLOOKUP to get the the crew nationality. That might be interesting to do.
And then the other thing I was interested in. How many people are usually there?
So we'll clear this filter and up here I put the number of crew 0 through 13.
And then a FREQUENCY function here.
A single formula using the Count and then the bins array here.
So the most common number of people in the last 20 years on the ISS is 2903 days with 6. The second most common is 2622 days with 3.
And then interesting here. Two happened a lot.
Early on, before September 1st 2000 there were a lot of days with zero.
And then 10 seemed to happen a fair amount. Never had more than 13.
And at this point given that we don't have the shuttle which could take a crew of seven.
We just have while the crew dragon that can take four, We'll probably never get to that 13 number again, I am guessing.
So there you have it.
Congratulations to all the astronauts on the ISS who have ever been to the ISS.
And to the international partners for 20 continuous years of having someone in the ISS.
Hey, go download this data set. It's a great data set.
I don't know how long it'll be there after this promotion from Microsoft.
If you have some time today or tomorrow go ahead and build a dashboard I will put the URL for the the dashboard competition.
Well if you like this video please Like, Subscribe, and Ring the Bell.
Feel free to post any questions in the comments below.
I want to thank you for stopping by we'll see you next time for another netcast from MrExcel.
Twenty years in space.
Hey, welcome back to the MrExcel netcast. I am Bill Jelen.
Today, Sunday November 1st 2020: Microsoft and their day of data promotions says that they are celebrating 20 years of human beings living and working aboard the ISS. 20 consecutive years with someone there every day.
They have this great free data set and even a little data visualization contest.
Although it's due on November 2, 2020.
But you want to go download this awesome BaseData.xlsx data set. It's just a great data set that I haven't seen.
It's based on some NASA data. Microsoft cleaned it up a little bit.
It struck me that would be interesting to see if there have really been someone on the ISS every single day since November 1st of 2000. Just check this out.
This is actually one of the questions in the data visualization challenge.
I think it's probably - at least the way that I did it - is one of the tougher ways.
I'd be interested if you have a much easier way to go.
In that base data data set there's something called Mission by Crew.
That says who it was, what spacecraft they launched on, spacecraft they returned on. Did they change the craft? How long were they up?
The launch date and the return date and then what space station did they visit.
Alright now the first thing when I when I ran this.
I came up with a whole bunch of days in this year when there was no one on the station.
And that didn't make sense.
I realized that what we have to do is we first have to go fix something in their data. So look for all of the ISS records.
And then where the mission result does not say Return but says Active.
Active and in that data that you download the return date is blank for those three.
Because, well, they don't know when they're going to return.
Actually NASA probably knows when they will return.
And what I did is I took these three cells and changed the return date to tomorrow =NOW() + 1 So in other words they're they're still up there.
Alright and with that one change then I was able to use this data set here Mission by Crew to figure out how many people were there on every single day.
Now a little bit of orbital mechanics here.
Generally, generally when they launch they'll arrive at the ISS the next day.
It is possible to arrive the same day but if there's a scrub because of weather, Then they have to wait 20 or 30 days.
With the usual approach of orbiting the earth and then catching up to the ISS you generally get there the next day.
And that allows if there's a scrub - if they can't launch today, they'll be able to launch tomorrow.
Maybe like 20 minutes earlier or something like that.
Right so my assumption here is that they're going to arrive at the ISS the day after they launch.
And they're going to leave the ISS the day before they actually return to Earth. The flight back to Earth is a little bit quicker.
But those are just going to be my assumptions straight away.
And in this data set it has all kinds of flights whether they're going to the ISS or not.
So the first thing I did was I went into Data, From Table or Range, and created a little Power Query here.
And the first thing I did was I filtered the space station to only say ISS.
So in other words if they didn't go to the ISS I don't want those records.
I removed a whole bunch of columns that I don't really need.
And I came down to who was the the crew member, when did they launch, and when they returned.
Now these records are only flights to the ISS and back.
And then I added a new column here called Arrive Date that is the launch date plus one.
I used Date dot Add Days for that.
That would have been so much easier in Excel just equal B2 plus one. But Date.AddDays and then the Depart Date is Data.AddDays with negative one.
And then remove those extra columns so I just have Crew Name, Arrive Date, and Depart Date.
Close and Load and I end up with this data set here that gives me Crew, Arrive Date, and Depart Date.
A simple little MIN function here tells me the earliest date that someone arrived at the ISS And then =TODAY() gave me today's date.
That's eight thousand and three days in total from the earliest date to the latest date.
So a SEQUENCE function here. 8003 rows, 1 column. Starting at 12/5.
And I formatted that as a date. Alright.
And then the tough part: how many people were there and who were they?
Now, if this was Friday and it was a Dueling Excel podcast, I know that Mike would come up with some awesome formula and I would of course just default to VBA create some huge data set and a pivot table.
But I decided to try and use some of the new functions that we have.
So, here I use the FILTER function. Filtering on crew name.
Looking for is the arrive date less than or equal to this day?
Is the depart date greater than equal to that day? And send that whole thing into a TEXTJOIN.
So it gave me a list of who the people were.
The weird thing though is right here on the 16th after this departed on the 15th, We had some people.
Then a whole bunch of days where no one was there right So Six people flew up to the ISS.
They were there for about 10 or 11 days. And then no one for a whole bunch of days.
And then another crew arrived and then no one and another crew arrived.
And those #N/A's that was causing some problems because I was getting a #CALC! error.
The filter was returning a #CALC!. I used IFERROR to return N/A.
This time I use the same FITLER function and send it into COUNT to count how many items we have.
Now this time because it is a numeric column the #CALC! error is not being counted by COUNT.
So that worked out great.
And so if we kind of cruise forward here right up to November 1st of 2000.
Right there. November 1st of 2000.
From that date forward I am going to do Ctrl+Shift+Down arrow. And look down here in the lower right hand corner.
The minimum number of people on any day in the last 20 years is two. The max is 13.
It's pretty crazy that they have 13 people.
Let's see if we can find those instances of when there were 13. I will use the filter here. 13 people.
Alright so wow. Okay. So April 29, 2001.
It looks like a space shuttle arrived and then two days later someone took off. So there were 13 there just for a couple of days.
On this day -- March 27th, there were 13.
Wow but then there was a whole big string here for several days it looks like almost a month.
And then half a month where they had 13 people on the ISS. I wonder if that was crowded or not?
And then how about the loneliest time on the ISS? If we go here to filter.
Get rid of everything. But there's never just a single person.
There was always at least two. Wow lots of times that there were two people.
Probably one on the Russian module and one on the US side at a minimum, I would guess.
I don't know that for sure.
And then we'd have to do some other analyses with some XLOOKUP to get the the crew nationality. That might be interesting to do.
And then the other thing I was interested in. How many people are usually there?
So we'll clear this filter and up here I put the number of crew 0 through 13.
And then a FREQUENCY function here.
A single formula using the Count and then the bins array here.
So the most common number of people in the last 20 years on the ISS is 2903 days with 6. The second most common is 2622 days with 3.
And then interesting here. Two happened a lot.
Early on, before September 1st 2000 there were a lot of days with zero.
And then 10 seemed to happen a fair amount. Never had more than 13.
And at this point given that we don't have the shuttle which could take a crew of seven.
We just have while the crew dragon that can take four, We'll probably never get to that 13 number again, I am guessing.
So there you have it.
Congratulations to all the astronauts on the ISS who have ever been to the ISS.
And to the international partners for 20 continuous years of having someone in the ISS.
Hey, go download this data set. It's a great data set.
I don't know how long it'll be there after this promotion from Microsoft.
If you have some time today or tomorrow go ahead and build a dashboard I will put the URL for the the dashboard competition.
Well if you like this video please Like, Subscribe, and Ring the Bell.
Feel free to post any questions in the comments below.
I want to thank you for stopping by we'll see you next time for another netcast from MrExcel.