Vote for Power Query Enhancement Request - Import Zip Files

jdellasala

Well-known Member
Joined
Dec 11, 2020
Messages
755
Office Version
  1. 365
Platform
  1. Windows
  2. Mobile
  3. Web
Hey,

I wanted to pull some data from a web site, but the Excel files are in a Zip file, and neither Excel nor Power Query can open a Zip file. Essentially a Zip file is a folder, and it seems to me that Power Query should be able to easily convert a Zip file into what would look like a Folder import in Power Query.

I've posted Power Query Unzip via Web or File or Folder on the Excel · Community Feedback portal. The Excel developers monitor posts and they sometimes incorporate ideas posted there.

If you agree this is something that would be helpful in Excel / Power Query, please hop over to the post and give it a vote.

Thanks!
 

Excel Facts

Which came first: VisiCalc or Lotus 1-2-3?
Dan Bricklin and Bob Frankston debuted VisiCalc in 1979 as a Visible Calculator. Lotus 1-2-3 debuted in the early 1980's, from Mitch Kapor.
neither Excel nor Power Query can open a Zip file
Not really. I am able to use the following code to unzip a Zip file and read its content successfully. See the source.
(It is referring to an article in Mark White's blog, but I don't think the original code still exists. Therefore, I am copying it below as a future reference)

Power Query:
let
    Source = File.Contents("C:\Users\Downloads\sample-feed.zip"),
    //Function Start - Credit - https://sql10.blogspot.com/2016/06/reading-zip-files-in-powerquery-m.html
    UnzipContents=(ZIPFile) =>
    let
        Header = BinaryFormat.Record([
            MiscHeader = BinaryFormat.Binary(14),
            BinarySize = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
            FileSize   = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
            FileNameLen= BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian),
            ExtrasLen  = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian)  
        ]),
        HeaderChoice = BinaryFormat.Choice(
            BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
            each if _ <> 67324752             // not the IsValid number? then return a dummy formatter
                then BinaryFormat.Record([IsValid = false, Filename=null, Content=null])
                else BinaryFormat.Choice(
                        BinaryFormat.Binary(26),      // Header payload - 14+4+4+2+2
                        each BinaryFormat.Record([
                            IsValid  = true,
                            Filename = BinaryFormat.Text(Header(_)[FileNameLen]),
                            Extras   = BinaryFormat.Text(Header(_)[ExtrasLen]),
                            Content  = BinaryFormat.Transform(
                                BinaryFormat.Binary(Header(_)[BinarySize]),
                                (x) => try Binary.Buffer(Binary.Decompress(x, Compression.Deflate)) otherwise null
                            )
                            ]),
                            type binary                   // enable streaming
                    )
        ),
        ZipFormat = BinaryFormat.List(HeaderChoice, each _[IsValid] = true),
        Entries = List.Transform(
        List.RemoveLastN( ZipFormat(ZIPFile), 1),
        (e) => [FileName = e[Filename], Content = e[Content] ]
        )
    in
        Table.FromRecords(Entries),
        //Function End
    Files = UnzipContents(Source)
in
    Files

1680706853545.png


All I needed to open one of the binary data as an Excel file to access the data in it.

Edit: For a remote zip file:
Power Query:
Source = Web.Contents("https://example.com/ZipFile.zip")
 
Last edited:
Upvote 0
Solution
Awesome @smozgur ! I followed the second link which has a function to open a Zip File:
Power Query:
(ZIPFile) => 
let
    Header = BinaryFormat.Record([
        MiscHeader = BinaryFormat.Binary(14),
        BinarySize = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
        FileSize   = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
        FileNameLen= BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian),
        ExtrasLen  = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian)    
    ]),
 
    HeaderChoice = BinaryFormat.Choice(
        BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
        each if _ <> 67324752             // not the IsValid number? then return a dummy formatter
            then BinaryFormat.Record([IsValid = false, Filename=null, Content=null])
            else BinaryFormat.Choice(
                    BinaryFormat.Binary(26),      // Header payload - 14+4+4+2+2
                    each BinaryFormat.Record([
                        IsValid  = true,
                        Filename = BinaryFormat.Text(Header(_)[FileNameLen]), 
                        Extras   = BinaryFormat.Text(Header(_)[ExtrasLen]), 
                        Content  = BinaryFormat.Transform(
                            BinaryFormat.Binary(Header(_)[BinarySize]),
                            (x) => try Binary.Buffer(Binary.Decompress(x, Compression.Deflate)) otherwise null
                        )
                        ]),
                        type binary                   // enable streaming
                )
    ),
 
    ZipFormat = BinaryFormat.List(HeaderChoice, each _[IsValid] = true),
 
    Entries = List.Transform(
        List.RemoveLastN( ZipFormat(ZIPFile), 1),
        (e) => [FileName = e[Filename], Content = e[Content] ]
    )
in
    Table.FromRecords(Entries)
Point the result of the Web Query to the function, and presto changeo - a table with two columns - FileName and Content. like so:
Power Query:
let
    Source = Web.Contents("https://www.census.gov/foreign-trade/Press-Release/current_press_release/ft900xlsx.zip"),
    Custom1 = fxUnZip( Source )
in
    Custom1
That works the way I wanted. No downloading, no VBA (you know by now I hate, well don't like VBA), and refreshable (I assume!).
Thanks for the pointer!
 
Upvote 0
Post Script... fxUnZipXL

The fxUnzip function above produced what looked like a standard PQ expanding of a folder or Excel file, and even though the Content column contained Binary form of an Excel file, a standard Expand didn't recognize the binary format. However, right clicking on one of the Binary items in the Content column was able to extract a single Binary file and then right click on it produced a tiny menu like this
1680776197022.png

and selecting Excel Workbook successfully extracted the file.

As there were 34 Excel files in the Zip file I was working with, I expanded the PQ Function above to extract Excel files that I call fxUnZipXL:
Power Query:
/* 
NOTE: TTHIS WILL TREAT ALL BINARIES FOUND IN THE ZIP FILE AS EXCEL FILES!
*/
(ZIPFile) => 
let
    Header = BinaryFormat.Record([
        MiscHeader = BinaryFormat.Binary(14),
        BinarySize = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
        FileSize   = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
        FileNameLen= BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian),
        ExtrasLen  = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian)    
    ]),
 
    HeaderChoice = BinaryFormat.Choice(
        BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
        each if _ <> 67324752             // not the IsValid number? then return a dummy formatter
            then BinaryFormat.Record([IsValid = false, Filename=null, Content=null])
            else BinaryFormat.Choice(
                    BinaryFormat.Binary(26),      // Header payload - 14+4+4+2+2
                    each BinaryFormat.Record([
                        IsValid  = true,
                        Filename = BinaryFormat.Text(Header(_)[FileNameLen]), 
                        Extras   = BinaryFormat.Text(Header(_)[ExtrasLen]), 
                        Content  = BinaryFormat.Transform(
                            BinaryFormat.Binary(Header(_)[BinarySize]),
                            (x) => try Binary.Buffer(Binary.Decompress(x, Compression.Deflate)) otherwise null
                        )
                        ]),
                        type binary                   // enable streaming
                )
    ),
 
    ZipFormat = BinaryFormat.List(HeaderChoice, each _[IsValid] = true),
 
    Entries = List.Transform( List.RemoveLastN( ZipFormat(ZIPFile), 1), (e) => [FileName = e[Filename], Content = e[Content] ]  ),
    ExtractBinaries = Table.FromRecords(Entries),
    AddedWorkbookTables = Table.AddColumn(ExtractBinaries, "Workbooks", each Excel.Workbook( [Content] )),
    RemovedContentColumn = Table.RemoveColumns(AddedWorkbookTables,{"Content"}),
    ExpandedWorkbooks = Table.ExpandTableColumn(RemovedContentColumn, "Workbooks", {"Name", "Data", "Item", "Kind", "Hidden"}, {"Name", "Data", "Item", "Kind", "Hidden"})
in
    ExpandedWorkbooks
I suppose a second parameter could be added to the original PQ Function to specify what format to extract, but that would only work if all the files in the Zip file were the same type. I'll think about that one.
Note the warning in the comment at the start of the file. If someone knows how to add that to the function's display, please let me know. I've been unsuccessful attempting using the Document commands so far!
In the mean time, hope someone can use this!
 
Upvote 0

Forum statistics

Threads
1,225,743
Messages
6,186,773
Members
453,370
Latest member
juliewar

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top