Use Power Query to Load Many Web Pages


February 24, 2023 - by

Use Power Query to Load Many Web Pages

Problem: I need to load data from a series of web pages. The Web Query technique from Excel 2003 rarely works with modern web pages.

Strategy: Use a hack with Power Query to create a Query of Queries.


Power Query is a new tool that is a free download for Excel 2010 & Excel 2013 and is built in to the Excel 2016 Data ribbon.

In this example, you are trying to pull data from a website for Dallas for January 2, 2015. You can see there are paremeters in the URL that specify the city and the date.

A website offers historical weather for a city and date. The URL of the page encodes the city and date in the address. For example, airport/KDFW/2015/1/2/DailyHIstory.html
This is perfect for using Power Query's from web feature.
Figure 1025. This website is a great candidate for scraping web data.


Use Data, New Query, From Other Sources, From Web. Enter the URL. Use Power Query tools to isolate the information from the web page. In my case, the query returns one row with Min & Max temperature and precipitation.

The query for one city returns Max Temperature, Min Temperature, and Precipitation in inches.
Figure 1026. Results of the query for one web page.

Once the query for a single web page is working, you will convert this query to a function. Edit the query. use the View tab in Power Query and choose Advanced. You will see the code behind the query. Insert a new line above the word Let. Type (MyURL)=>. Scan through the remaining code and replace the actual “URL” with the word MyURL.

In the Advanced View in Power Query, type these at the start of the query to turn it into a function:
(MyURL)=>
let
---
later, in the query definition, change URL to say MyURL
Figure 1027. Use the Advanced view to add a parameter.

When you close the Advanced view, you will notice several alarming things. The list of Applied Steps will be replaced with a single step. Power Query will be asking you to enter the Parameter for MyURL. Ignore these apparent problems. Type a new name for the query such as fxWeather. (Note that the fx prefix is geek-speak for Function).

In the Query Settings panel, change the Name to fnWeather.
Figure 1028. Rename the query. You will need this exact name later, with the exact case.

When you choose Home, Close & Load, your original returned results will disappear. Don’t be alarmed. Nothing will load, but you will see a query panel indicating there is a connection-only query called Weather.

In the Workbook Queries pane, 1 query called fnWeather appears as Connection Only.
Figure 1029. The data is gone, but the query is still there.

Next, you will build a table of all of the URL’s. In my case, I created a worksheet where I could change the airport code and the start date. Formulas in the table then generate the list of URLs. Make sure to Format as Table using Ctrl+T.

Enter the airport in E1 and a start date in E2. Put the opening part of the URL in E3 and the ending part in E4. Formulas in A7 through C37 list the dates for the month, encode it as 2016/10/1 and then concatenate the full URL in C7 and below.
Figure 1030. Use Excel formulas to generate a list of URLs.

Select one cell in the date table. Select Data, Get & Transform, From Table. In Power Query, choose Add Column, Add Custom Column.

The Add Custom Column dialog in Power Query. The new column name is Weather. The Custom Column Formula is =fnWeather([URL]).
Figure 1031. Use the fnWeather query to create a custom column.

Because the function is returning an array of three values, you have to click the Expand icon and choose to add all three columns. Make sure to unselect the Prefix option.

Expand the Weather column. Uncheck Use Original Column Name as Prefix. Keep all three columns checked for Max Temperature, Min Temperature, and Precipitation.
Figure 1032. Expand the column to show all three fields.

When you close & load, you will notice that Power Query is taking a few seconds to update each row. If you were returning thousands of rows, you would want to load just before going to lunch.

The query result shows one line per day, with Date, Max, Min, Precipitation.
Figure 1033. Each row contains data from a separate web page.

This article is an excerpt from Power Excel With MrExcel

Title photo by Christopher Burns on Unsplash