Pulling Data from a Load-As-You-Scroll List

rjbinney

Active Member
Joined
Dec 20, 2010
Messages
293
Office Version
  1. 365
Platform
  1. Windows
Dear Mr. Excel:

I am loving using "Get Data" and pulling tables off the web - Wikipedia tables work quite nicely.

I've been running across a few data sets where, in a normal browser, the page loads the first few items until you scroll to the end, then it loads the next few, and so on. Kinda like a Reddit feed.

When I use Get Data=>From Web, it pulls out a nice table, but ONLY for that first load. It doesn't grab the whole data set.

I don't know anything about web coding, Python, or anything like that, so my flicking through "Inspect Page" in my browser I felt like my dog watching nature shows. I asked a few AI search engines if they could pull the data, but the only thing that came close wanted to write me a script that I'd load in some emulator and blah blah blah.

Any thoughts?

Here's an example data set (the actual one I'm working with is proprietary, but this one seems to behave similarly in my browser).

Any thoughts would be cool.

Thanks!

Love,
 

Excel Facts

Links? Where??
If Excel says you have links but you can't find them, go to Formulas, Name Manager. Look for old links to dead workbooks & delete.
I don't think you can just use the basic Get Data interface to get what you need in this case. The data here isn’t directly available other than the first page —it’s paginated, meaning it loads in chunks as you scroll down the page. Behind the scenes, this happens through a JavaScript fetch call in the browser.

So, to get the data, we need to dig deeper. Instead of pulling from the HTML you see, we’ll fetch the data directly from the web service (Web API). If you’re curious and want to explore some M language methods for working with web services, keep reading.

When you scroll the page and track the network calls (browser Developer Tools - Network tab), you’ll notice that the endpoint https://www.digitalspy.com/api/listicle-slides gets called. This endpoint provides the paginated data, and it takes two query parameters:
  • id: the identifier for the image set group (for example: 93977ede-6f1e-4977-982a-a2a317f893b2 in this case. You need to find this and I didn't go that far. Perhaps, the page that you got the original link in your question might be giving some information about it.
  • page: the page number, starting at 1 and incrementing for each subsequent page
We just need to call this API repeatedly for each page to fetch the data, which is returned in JSON format.

{endpoint}?id=93977ede-6f1e-4977-982a-a2a317f893b2&page=1
{endpoint}?id=93977ede-6f1e-4977-982a-a2a317f893b2&page=2
{endpoint}?id=93977ede-6f1e-4977-982a-a2a317f893b2&page=3
{endpoint}?id=93977ede-6f1e-4977-982a-a2a317f893b2&page=...

See how the page parameter changes with each call?

Now, how do we know when to stop? Normally, APIs provide pagination metadata (like total pages or item counts), but this one doesn’t. Instead, it is just returning empty data when there’s nothing left for the queried page.

So, we can keep calling until we get an empty data set.
Alternatively, we could count the items returned in each page and stop when we see fewer items than the first page has. Personally, I’d just make one extra API call and handle the empty response—it’s simpler to code that way instead of dealing with the unknown first page item count (what if there is only one page data!).

Finally, here’s the code that does what we need. It includes a function (fxAPI) that’s called recursively and the beautiful List.Generate function, which helps us collect each data set into a list. From there, it’s just about creating a meaningful table at the end. In the final step of the query, you’ll notice some columns containing Records that you can expand to get the full details.

Create a blank query. Open the Advanced Editor, delete the default content, and then copy and paste the code below. Click Done to execute it. If you see any prompts asking for permission to access the web content, just confirm it and proceed as an Anonymous user.
Power Query:
let
    fxAPI = (id as text, page as number) =>
        let
            Result = Json.Document(Web.Contents("https://www.digitalspy.com/api/listicle-slides?id=" & id & "&page=" & Text.From(page)))
        in
            Result,
    id = "93977ede-6f1e-4977-982a-a2a317f893b2",
    Pages = List.Generate(
        () => [x = 1, data = fxAPI(id, 1)],
        each List.Count([data]) > 0,
        each [x = [x] + 1, data = fxAPI(id, [x] + 1)],
        each [data]
    ),
    Result = List.Combine(Pages),
    ConvertToTable = Table.FromList(Result, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    ExpandRecords = Table.ExpandRecordColumn(ConvertToTable, "Column1", Record.FieldNames(ConvertToTable{0}[Column1]))
in
    ExpandRecords
 
Last edited:
Upvote 0

Forum statistics

Threads
1,223,952
Messages
6,175,595
Members
452,656
Latest member
earth

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top