Combine multiple API calls into one result

Ricyteach

New Member
Joined
Jun 19, 2013
Messages
7
Hello; I am cross-posting this question from SO.

I'm using an API (for a project management database) that returns a result in the following form (in JSON format):

JSON:
{'Content':
    {
        'Projects': ['project1', 'project2', 'project3', 'project4'],
        'InvoiceGroups': ['group1', 'group2', 'group3', 'group4'],
        'Phases': ['phase1', 'phase2', 'phase3', 'phase4'],
        'Resources': ['resource1', 'resource2', 'resource3', 'resource4'],
    }
}

NOTE: in reality there are tens of thousands of entries in each of these inner/nested lists; and each entry is actually not a string but a JSON object (i.e., Power Query record).

The API limits the database request to about 2000 projects at a time, so I have to break the request up into chunks. So instead of getting a single JSON object (PQ record) back with the entire response, I am getting multiple of them in a list.

So I have something like this, which is a list of JSON objects (PQ list of PQ records):

JSON:
[
    {
        'Content':
        {
            'Projects': ['project1'],
            'InvoiceGroups': ['group1'],
            'Phases': ['phase1'],
            'Resources': ['resource1'],
        }
    },
    {
        'Content':
        {
            'Projects': ['project2'],
            'InvoiceGroups': ['group2'],
            'Phases': ['phase2'],
            'Resources': ['resource2'],
        }
    },
    {
        'Content':
        {
            'Projects': ['project3'],
            'InvoiceGroups': ['group3'],
            'Phases': ['phase3'],
            'Resources': ['resource3'],
        }
    },
    {
        'Content':
        {
            'Projects': ['project4'],
            'InvoiceGroups': ['group4'],
            'Phases': ['phase4'],
            'Resources': ['resource4'],
        }
    },
]

My question is: how can I write an efficient Power Query function to recombine these separate responses back into the original response, with the nested lists recombined the way they should be?

I attempted just using the [`Record.Combine`][1] function, but it does not work:

Power Query:
let
    recombined = Record.Combine(list_of_records)
in
    recombined

(NOTE: I'd prefer to do this in Python but for reasons I won't go into, we do need to do this in PQ.)


[1]: Record.Combine - PowerQuery M
 
Last edited by a moderator:

Excel Facts

Wildcard in VLOOKUP
Use =VLOOKUP("Apple*" to find apple, Apple, or applesauce
For convenience: here are the objects using PQ format instead of JSON:

STARTING DATA STRUCTURE
Power Query:
{
    [
        Content =
        [
            Projects = {"project1"},
            InvoiceGroups = {"group1"},
            Phases = {"phase1"},
            Resources = {"resource1"},
        ]
    ],
    [
        Content =
        [
            Projects = {"project2"},
            InvoiceGroups = {"group2"},
            Phases = {"phase2"},
            Resources = {"resource2"},
        ]
    ],
    [
        Content =
        [
            Projects = {"project3"},
            InvoiceGroups = {"group3"},
            Phases = {"phase3"},
            Resources = {"resource3"},
        ]
    ],
    [
        Content =
        [
            Projects = {"project4"},
            InvoiceGroups = {"group4"},
            Phases = {"phase4"},
            Resources = {"resource4"},
        ]
    ],
}

DESIRED DATA STRUCTURE
Power Query:
[
    Content =
    [
        Projects = {"project1", "project2", "project3", "project4"},
        InvoiceGroups = {"group1", "group2", "group3", "group4"},
        Phases = {"phase1", "phase2", "phase3", "phase4"},
        Resources = {"resource1", "resource2", "resource3", "resource4"},
    ]
]
 
Upvote 0
Can you also show the pagination part of the JSON? It should be either at the bottom or at the top of the batch.

If there is no pagination, then how would you make the next calls?
 
Upvote 0
This is just a toy example illustrating the problem. The server does not paginate anything. I'm not familiar with JSON pagination in any case.
 
Upvote 0
The API limits the database request to about 2000 projects at a time, so I have to break the request up into chunks. So instead of getting a single JSON object (PQ record) back with the entire response, I am getting multiple of them in a list.
From this paragraph, it sounded, to me, like you were making multiple calls to get the data in batches. That's the reason of my question.

However, I will assume you have one big file with all records considering your last reply. By the way, the JSON you posted in the initial question is not a valid JSON. The other sample you sent in the second post is not even a JSON string. It should be something similar to this:

JSON:
[
  {
        "Content": {
            "Projects": ["project1"],
            "InvoiceGroups": ["group1"],
            "Phases": ["phase1"],
            "Resources": ["resource1"]
        }
    },
    {
        "Content": {
            "Projects": ["project2"],
            "InvoiceGroups": ["group2"],
            "Phases": ["phase2"],
            "Resources": ["resource2"]
        }
    },
    {
        "Content": {
            "Projects": ["project3"],
            "InvoiceGroups": ["group3"],
            "Phases": ["phase3"],
            "Resources": ["resource3"]
        }
    },
    {
        "Content": {
            "Projects": ["project4"],
            "InvoiceGroups": ["group4"],
            "Phases": ["phase4"],
            "Resources": ["resource4"]
        }
    }
]

Now, let's assume this is the data structure. What is the result you'd like to get from this data after transformation? As a table, not as a new JSON string (I don't think you'll be using PQ to create another JSON string). I understand this is just a data sample, but I am having difficulty understanding the structure.

My recommendation, unless someone who would understand the big picture either in here or SO shows up, then the perfect way would be showing the sample data before the transformation, and desired result after the transformation, both in table form (skip the JSON part, you will already load it as you need in your data). That way, we can see the original and desired final data structure clearly and suggest/provide a transformation method in PQ.
 
Upvote 0
Thank you for helping. Really appreciate it.

I'm sorry if the JSON didn't come through correctly (and I could not figure out how to edit that first post in this forum). I created the toy example to make things more simple, and didn't check it.

Yes: I am sending multiple calls to the API, like below. The second to last variable GetProjectsResponseList_filtered is simply a list of the responses to the individual calls to the API. The calls made using List.Transform and my GetProjects function.

Power Query:
let
    // API info
    AjeraAPI =
    let
        rec = [
            URL = "https://ajera.com/my_api_key",
            JSON_HEADERS = [#"Content-Type"="application/json", Accept="application/json"],
            USER = "my_user",
            PW = "my_pw"
            ]
    in
        rec,

    SessionToken =
    let
        CreateAPISession = Json.Document(Web.Contents(AjeraAPI[URL], [Headers = AjeraAPI[JSON_HEADERS], Content = Json.FromValue([Method="CreateAPISession", Username=AjeraAPI[USER], Password=AjeraAPI[PW], APIVersion=2, UseSessionCookie=false])])),
        TOKEN = CreateAPISession[Content][SessionToken]
    in
        TOKEN,

    // function: retrieve a list projects
    ListProjects = (search_record as record) => Json.Document(Web.Contents(AjeraAPI[URL], [Headers = AjeraAPI[JSON_HEADERS], Content = Json.FromValue([Method="ListProjects", SessionToken=SessionToken, MethodArguments=search_record])]))[Content],

    // function: retrieve database records for a list of project keys
    GetProjects = (project_key_list as list) => Json.Document(Web.Contents(AjeraAPI[URL], [Headers = AjeraAPI[JSON_HEADERS], Content = Json.FromValue([Method="GetProjects", SessionToken=SessionToken, MethodArguments=[RequestedProjects=project_key_list]])])),

    // get a list of filtered project keys, and chunk it into a list of lists
    ListProjectsResponse_filtered = ListProjects([]), // example filter: [FilterByIDLike = "28222"]
    ListProjectsResponseTable_filtered = Table.FromRecords(ListProjectsResponse_filtered[Projects]),
    ProjectKeyList_filtered = ListProjectsResponseTable_filtered[ProjectKey],
    ChunkSize = 1999, // max number of API requests allowed at one time
    ProjectKeyChunkedListOfLists_filtered = List.Split(ProjectKeyList_filtered, ChunkSize),

    // get multiple (from each chunk) server responses from GetProjects function, in a list
    GetProjectsResponseList_filtered = List.Transform(ProjectKeyChunkedListOfLists_filtered, GetProjects),

    // now recombine the server responses
    GetProjectsResponse_filtered = // %%%%%%******what to put here???******%%%%%%%
in
    GetProjectsResponse_filtered

Now this is really important: the final result I need to provide is not a table, but a data structure containing a lot of data, for multiple tabes. The reason is, this is a tool I am developing for a corporate application that is going to be handed off to other people to make whatever tables they wish out of the data structure. They will make separate PQ references/queries based on this query, and make whatever tables they need out of the data. Because it is being used this way I need the handwritten M code all contained in a single, separate query that other users will not touch.

In order to keep things simple, the toy example glosses over the issue that each of the four lists in the data structure (Projects, InvoiceGroups, Phases, Resources) is actually not just a list of strings like in the toy, but a list of records, each of which could be a table.

In other words the server response contains not just one table but FOUR.

But I don't think any of this should matter much. If I can figure out the best way to get the toy example "STARTING DATA STRUCTURE" turned into "DESIRED DATA STRUCTURE", then everything will work ("DESIRED DATA STRUCTURE" and "STARTING DATA STRUCTURE" are from my previous post).

However since you asked for more info, and since I am happy to be open minded that the solution I have come up with (outlined just now) may not be the best, here is a more complete toy showing the form of an individual server response:

Power Query:
[
    ResponseCode = 200,
    Message = "Success",
    Errors = {},
    Content =
    [
        Projects = {
            // < THIS LIST IS 10s OF THOUSANDS OF RECORDS LONG AND EACH RECORD HAS ~100 FIELDS EACH>
            // < MOST FIELDS ARE STRINGS AND NUMBERS BUT MANY ARE NESTED RECORDS>
        },
        InvoiceGroups = {
            // < THIS LIST IS 10s OF THOUSANDS OF RECORDS LONG AND EACH RECORD HAS ~100 FIELDS EACH>
            // < MOST FIELDS ARE STRINGS AND NUMBERS BUT MANY ARE NESTED RECORDS>
        },
        Phases = {
            // < THIS LIST IS 10s OF THOUSANDS OF RECORDS LONG AND EACH RECORD HAS ~100 FIELDS EACH>
            // < MOST FIELDS ARE STRINGS AND NUMBERS BUT MANY ARE NESTED RECORDS>
        },
        Resources = {
            // < THIS LIST IS 10s OF THOUSANDS OF RECORDS LONG AND EACH RECORD HAS ~100 FIELDS EACH>
            // < MOST FIELDS ARE STRINGS AND NUMBERS BUT MANY ARE NESTED RECORDS>
        },
    ]
]

The contents of GetProjectsResponseList_filtered is a list of these server responses, that have been chunked so that the project list is only 1999 records long, and not tens of thousands.

My goal, basically, is to "unchunk" these responses.
 
Last edited:
Upvote 0
BTW I am sorry if my PQ code isn't very good/hard to read; I only started learning PQ on Monday. I've been coding python for a decade now though so it wasn't too terrible to pick up, but I am at a loss as to how to recombine this data structure/server response in the M code.
 
Upvote 0
It is hard to make this work without actual data, but I'll try to do my best by using the mockup data.

Save the following JSON string that I have fixed above as data.json somewhere in your computer (accessible from Power Query).
* The ExpandedColumns step is supposed to be the data you should provide from the actual source when you implement this in real.
JSON:
[
  {
        "Content": {
            "Projects": ["project1"],
            "InvoiceGroups": ["group1"],
            "Phases": ["phase1"],
            "Resources": ["resource1"]
        }
    },
    {
        "Content": {
            "Projects": ["project2"],
            "InvoiceGroups": ["group2"],
            "Phases": ["phase2"],
            "Resources": ["resource2"]
        }
    },
    {
        "Content": {
            "Projects": ["project3"],
            "InvoiceGroups": ["group3"],
            "Phases": ["phase3"],
            "Resources": ["resource3"]
        }
    },
    {
        "Content": {
            "Projects": ["project4"],
            "InvoiceGroups": ["group4"],
            "Phases": ["phase4"],
            "Resources": ["resource4"]
        }
    }
]

Create a new blank query in Power Query, go to Advanced View and copy and paste the following M code.
Power Query:
let
    // Use the sample JSON I fixed above as a local file - change the path with yours
    Source = Json.Document(File.Contents("C:\some\path\data.json")),
    // Convert the imported list to a table
    ConvertToTable = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    // Expand the content of the generated table
    ExpandedContent = Table.ExpandRecordColumn(ConvertToTable, "Column1", {"Content"}),
    // Expand all columns of the table
    ExpandedColumns = Table.ExpandRecordColumn(ExpandedContent, "Content", Record.FieldNames(ExpandedContent{0}[Content])),
    // Move the headers into the data - you'll see why in the next step
    DemotedHeaders = Table.DemoteHeaders(ExpandedColumns),
    // Transpose the table, the first columns return to the content properties
    TransposedTable = Table.Transpose(DemotedHeaders),
    // The data columns in the table - skip the first column
    Columns = Table.ColumnNames(Table.RemoveColumns(TransposedTable, {"Column1"})),
    // This is tricky, simply adding a new column consist of structural data (lists) in the data columns
    CombinedColumns = Table.AddColumn(TransposedTable, "Combined", (r) => List.Combine(List.Transform(Columns, each Record.Field(r, _)))),
    // Clean up
    Result = Table.RenameColumns(Table.RemoveColumns(CombinedColumns, Columns), {{"Column1", "Content"}})
in
    Result

Then examine the steps. I intentionally didn't use any hard coded column names, so as long as the data structure is compatible, it should deal with any number of content property, and lists with any number of structured (list) or primitive (text, number, etc) value type items. The CombinedColumns step is most likely the step that I believe you'll be interested.

So, basically, the following is the Content data at the ExpandedColumns step;
1695958992346.png


Finally, you'll get the following result:
1695958572034.png

I know the code might not fit to the actual data structure as it is but I tried to make sure it fits your starting and desired data structures as in the given samples. I hope it helps.
 
Upvote 0

Forum statistics

Threads
1,223,880
Messages
6,175,154
Members
452,615
Latest member
bogeys2birdies

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top