Excel's XML SOURCE window showing heirarchy - Need to transfer or extract it's contents ASAP

ChrisOK

Well-known Member
Joined
Mar 26, 2003
Messages
601
As a long time member of MrExcel, I'm hoping all the Aladin Akyurek genius-types of folks can provide some desperately needed help -- I've been searching for weeks and not finding a *single soul* that knows anything on this topic or it's Excel functionality which seems shocking!?!

With client deadline fast approaching and knowing that it's sometimes hard to visualize; I recently posted on 2 sites that have the ability to upload a sample pic file - to better explain..
Cross-post link disclaimer is available if seeing the uploaded visual is helpful:
https://www.excelforum.com/excel-pr...336-extract-heirarchy-from-source-window.html
http://www.msofficeforums.com/excel-programming/37789-extract-heirarchy-source-window.html

The image shows the several tiers of the Parent Child relationships within the EXCEL XML SOURCE window. (there's at least 6 tiers)

I simply need everything that's on this list to be layed into a spreadsheet or document.
Ideally, if it could mirror the architecture that'd be great to see visually at a glance how many layers deep an element lies...

Heck, I'll even take it in a hard to read layout like:
Level.........Name........
1..............Abc (which is obviously the parent as a '1')
2..............Bbc (level 2 is a child to the parent listed above it)
3..............Ccc (level 3 here is a child to the child)
1..............Ddd
2..............Eee
1..............
2..............
3..............
4..............
5..............
6..............

Ultimately the client needs to see (with as much ease as possible) a full list of everything in that map... to gauge magnitude... next to see how many parents there are -- and determine how many levels inward they want us to go as a dev team -- to tackle issues within a system.

When I select all, copy & paste everything in that XML Source window, (paste it long ways along row one of a sheet) -- it holds over 8,000 items
I'm finding that it is leaving out the PARENTS which is a critical piece...
So, I don't have any idea HOW MANY items are REALLY within that XML Source window because what got pasted is NOT all inclusive! Frustrating!

The main 4 things I need are the PARENT, REQUIRED PARENT, CHILD and REQUIRED CHILD (based on the icons shown in the XML Source window, those are the 4 items I need a list of)

Need help with some VBA or a method for grabbing everything in that window and pasting it either into an Excel sheet - -or even into Word if that's easier --

Any ideas how to pull out what's being displayed visually within the Excel files XML Source window?
Thanks in advance -- praying SOMEONE -- ANYONE knows something about this window and how to get what's in it --- OUT!
Chris
 
In attempt to rule out it being computer specific, I emailed the 2 files to a diff computer, diff model, etc -- and placed them within a folder on the desktop. Turned on the 6.0 References area object.
Tried re-running and the code and it continues to stop in the same area, with the same error - User Defined Type Not Defined.
Using Excel 2016 -- is there some nuance that might be diff there?
NoSparks are you also using Excel 2016?
The plot thickens ---

Hand-typing 8,000 plus tomorrow will be daunting -- I hope a solution comes to light --
NO SPARKS -- can you please upload the one that you have -- that WORKS -- to "BOX" (free space to upload things)
https://www.box.com/

I'd love to see what it looks like (even if small font)
and if anyone else is having luck -- PLEASE UPLOAD TO the BOX site!
This might be a huge help --
 
Last edited:
Upvote 0

Excel Facts

When did Power Query debut in Excel?
Although it was an add-in in Excel 2010 & Excel 2013, Power Query became a part of Excel in 2016, in Data, Get & Transform Data.
I have Excel 2010 64bit.

This is what I originally tried. Added a new row 1 and formula to see how many cells were filled in each column.
https://app.box.com/s/6be18cebt7n2j4xvxwftz5oymrg8ykiu

This is your file, I made the veryhidden sheet visible and moved it to be second sheet and ran macro again.
https://app.box.com/s/kd54cvb1z2v0hhrkih97l287kujz9p50

I don't know anything about xml or xsd files, just following this out of interest.
First thing I did after you posted where to get the xsd file was open it in NotePad++, it's contents are all Greek to me. :)
 
Last edited:
Upvote 0
Here is another approach for the main procedure. Instead of loading the .xsd file (the .Load ThisWorkbook.Path & "\PDS_Schema_v2.5.1.xsd" line), this new code parses the internal XML schema previously added to the workbook via Developer tab > Source > XML Maps...

I have included code to anticipate errors and it will display a message box if it can't find the xs:element name="ProcurementDocument" element.

Code:
Public Sub Extract_Internal_XML_Document_Structure()
    
    'Requires reference: Microsoft XML v6.0
    
    Dim XMLinternalSchema As XmlSchema
    Dim XMLdoc As DOMDocument60
    Dim XMLmainNode As IXMLDOMNode
    Dim destCell As Range
    Dim rowOffset As Long
    
    On Error Resume Next
    Set XMLinternalSchema = ThisWorkbook.XmlMaps(1).Schemas(1)
    On Error GoTo 0
    If XMLinternalSchema Is Nothing Then
        MsgBox "This workbook doesn't contain an XML schema"
        Exit Sub
    End If
    
    Set XMLdoc = New DOMDocument60
    'Set XMLdoc = CreateObject("MSXML2.DOMDocument.6.0") 'for late binding
    
    With XMLdoc
        'The .xsd file contains the following schema:
        '< xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified" version="07112016">
                
        'How To Specify Namespace when Querying the DOM with XPath - https://support.microsoft.com/en-us/help/294797
        
        .async = False
        .validateOnParse = False
        .SetProperty "SelectionLanguage", "XPath"
        .SetProperty "SelectionNamespaces", "xmlns:xs='http://www.w3.org/2001/XMLSchema'"
        .LoadXML XMLinternalSchema.XML
    End With
    
    With ThisWorkbook.Worksheets(1)
        .Cells.ClearContents
        Set destCell = .Range("A1")
    End With
    
    'Get the main document node, < xs:element name="ProcurementDocument">.  All data elements are contained in this node
    
    Set XMLmainNode = XMLdoc.SelectSingleNode("//xs:element [@name='ProcurementDocument']")
    
    If Not XMLmainNode Is Nothing Then
        Scan_ElementsToCells XMLdoc, XMLmainNode, destCell, rowOffset, 0
        MsgBox "Finished parsing internal XML schema '" & XMLinternalSchema.Name & "' in XML map '" & XMLinternalSchema.Parent.Name & "'" & vbNewLine & vbNewLine & _
               "Last row written to = " & destCell.Row + rowOffset - 1
    Else
        MsgBox "Unable to find xs:element with name 'ProcurementDocument' in internal XML schema '" & XMLinternalSchema.Name & "' in XML map '" & XMLinternalSchema.Parent.Name & "'"
    End If
        
End Sub
I'm using Excel 2016 32-bit and the code writes to 11,182 rows. Cell I11182 contains "ServiceContractActText", which is the last element shown in Excel's XML Source tree.
 
Last edited:
Upvote 0
Very excited to see a couple new things to try this morning - thank you John & NoSparks for this -- feels like we're so close -- I'll test/address 1 at a time..
JOHN - here's the result of trying a new file, new folder holding only the excel file and the prds 3.0 xsd file -- using the new code in Module 1.
https://app.box.com/s/0fw1m6tgn9an4vpq0920cim9vo2f4sfh

FYI: here's the specs my system:
windows 7 professional, 64 bit operating system, excel 2016
one thought: is there anything else that could be turned on in the TOOLS>References (Object Library) that could be causing an issue - (based on the error we get?) if it's possible, I can take inventory of what few items are checked and post that -- if it would help?
 
Upvote 0
I have Excel 2010 64bit.

This is what I originally tried. Added a new row 1 and formula to see how many cells were filled in each column.
https://app.box.com/s/6be18cebt7n2j4xvxwftz5oymrg8ykiu

This is your file, I made the veryhidden sheet visible and moved it to be second sheet and ran macro again.
https://app.box.com/s/kd54cvb1z2v0hhrkih97l287kujz9p50

I don't know anything about xml or xsd files, just following this out of interest.
First thing I did after you posted where to get the xsd file was open it in NotePad++, it's contents are all Greek to me. :)

NOSPARKS: OMG! That's EXACTLY WHAT I NEED!! (the first link -- showing the *BEAUTIFUL* hierarchy!) -- I'm SOOO frustrated as to why I can't get it to run on this side because I've got another file that I need to do the same thing with --- so I can compare the delta of the 2 hierarchies... arrgh!
Now do you see why I didn't want to have to hand type? 11,000+ items -- no way I could meet the client request in time by hand..
I'm DANCING right now because I see it's possible but feel sick to my stomach that it's before me like candy again -- but I can't have a taste!

The 2nd "your file" link is the result of another code attempt and as you can see -- although admirable, it's just not the hierarchy that is needed as your first link shows...
As far as file contents - -yeah, Greek to me too -- lol (but as you've demonstrated, it can be pretty no matter how Greek it is -- if the code extrapolates it properly and lays it into a nice pretty hierarchy!

Gotta figure out what is causing the disconnect.... probably something obvious and small that will make a world of diffc if we can just find it --
 
Upvote 0
It's great that the file I posted is exactly what you need but I didn't really have anything to do with it.
You supplied a link to what you're working with and John supplied the code to try and it just worked.

John's second code works fine for me too.

Looking at the last picture you've linked to...
I'd say you don't have the Microsoft XML, v6.0 included and, judging by the slider size on the right side of the code window, haven't included the Private Sub Scan_ElementsToCells either.
 
Upvote 0
JOHN - here's the result of trying a new file, new folder holding only the excel file and the prds 3.0 xsd file -- using the new code in Module 1.
https://app.box.com/s/0fw1m6tgn9an4vpq0920cim9vo2f4sfh

FYI: here's the specs my system:
windows 7 professional, 64 bit operating system, excel 2016
one thought: is there anything else that could be turned on in the TOOLS>References (Object Library) that could be causing an issue - (based on the error we get?) if it's possible, I can take inventory of what few items are checked and post that -- if it would help?
I should have been clearer. Put the code for the new main procedure, Extract_Internal_XML_Document_Structure, in the same module as the previous code. Place it anywhere in the module below the Option Explicit line.

The only reference you need is Microsoft XML v6.0.

The new procedure calls the same subroutine (Scan_ElementsToCells) as the previous main procedure, but processes the XML schema stored within the workbook, which has been previously added to the workbook via Developer tab > Source > XML Maps... > Add... > browse to and select the .xsd file > click Open > click OK. The map is displayed in Excel's XML Source window. Because the new macro processes the internal schema you don't need the .xsd file.

Here is your workbook with all my code added to it (in modDOMdocument) and the reference added, and the result of running the Extract_Internal_XML_Document_Structure macro in Sheet2.

https://drive.google.com/open?id=15Qhn8FHOmOatHpm38oQ8AdTfQAH1ah12

The internal schema in your workbook is smaller than the schema in the file at https://www.acq.osd.mil/dpap/pdi/eb/docs/PDS_Schema_v2.5.1.xsd, and the last row number the macro outputs is 3084.
 
Upvote 0
Head is spinning -- trying to digest this - -and am trying to figure out if there was a cross-over between the 2 files which might have caused the prob...
The orig xsd file trying to tackle is the.. "PDS_Schema_2.5.1" ......https://app.box.com/s/5qycephvnzys60y57d84r6blyixkizem
The other xsd file to compare is the..... "PRDS_v3.0" ................. https://app.box.com/s/zf4xi4ohs525yrwkwwrij238pruxpbsq
The hierarchy is slightly diff and that was my end-goal was to find a way to extract the hierarchy of each of those files and lay them side by side so the client can get a visual, then find a way to run a delta.
Here's what I'm talking about -- https://app.box.com/s/p1z0h1gobyth9g2bhlb7cogq3bhpwf32

Facts/Questions:
=I definitely have turned on the VBA References XML 6.0 item (do I need to turn it on every time I open each of those files?) - I only did it once thinking once checked, it's "ON" for everything
...maybe that could be an issue

=I can see that the code John wrote specifically refers to the actual file name (is it as easy as changing out that one file name and it will work for various xsd files?)
..referenced file name in this line of code: .Load ThisWorkbook.Path & "\PDS_Schema_v2.5.1.xsd"
..I'm going to start another from scratch to verify whether the 2.5.1 file was sucked into the SOURCE or if the 3.0 was... as THAT could have been the issue

Thanks John for providing the updated code with everything *exactly where you intended for it to go* --- I was worried with the various inserts - that something might not be as intended..
I've downloaded this new one with John's new code and will use the code in the module called: "modDOMdocument" in a brand new file and verify that the 2.5.1 file is the one used (either by having it in the folder -- or by ADDING IT via the SOURCE window) -- I guess I'll have to try both ways at this point b/c I'm unsure which is the best/correct intended way..

The statement you made: ["Because the new macro processes the internal schema you don't need the .xsd file."] kind of blows my mind..
Are you saying I should NOT have been going through the steps to ADD the xml via the SOURCE map window with the original code?
Or are you saying that this new code will work better if I have performed those steps to ADD the XML via the SOURCE map area?
Or are you saying that this new code has some special way of locating the xml info without looking to the SOURCE map window (in other words it will look at a specific file stored in the same folder as the XLSM file and will extract what it needs directly from the XSD file without the need of ADDING via SOURCE?)

I don't care which way -- I just need to make sure I'm understanding the right way you've intended this last-most recent batch of code to work in conjunction with steps I'm taking.
Example:

New Code and New Steps I Need to Take to Make Sure it Works:
1-create folder, add a new XLSM file with NEW code pasted into a NEW module, name that module "modDOMdocument"
2-while in Excel, open VBA window, TOOLS > REFERENCES > check box "Microsoft XML v6.0" (and only need to do once OR need to do every time you open file)
2-add the xml file to that same folder
3-run the code (result content should appear on "Sheet1" or XXX)

or Alternatively....something like:
1-create folder, add a new XLSM file with NEW code pasted into a NEW module, name that module "modDOMdocument"
2-navigate to the Developer tab > SOURCE > XML Maps > Add (add the xml), OK
3-while in Excel, open VBA window, TOOLS > REFERENCES > check box "Microsoft XML v6.0" (and only need to do once OR need to do every time you open file)
4-add the xml file to that same folder (or maybe this is not needed if step 2 is performed? - confused on this point)
5-run the code (result content should appear on "Sheet1" or XXX)

I swear, this day can't end until I get this clear n' square =-) Thanks again for both your help/mentorship on this extremely daunting feat!
 
Upvote 0
I don't care which way -- I just need to make sure I'm understanding the right way you've intended this last-most recent batch of code to work in conjunction with steps I'm taking.
Example:

New Code and New Steps I Need to Take to Make Sure it Works:
1-create folder, add a new XLSM file with NEW code pasted into a NEW module, name that module "modDOMdocument"
2-while in Excel, open VBA window, TOOLS > REFERENCES > check box "Microsoft XML v6.0" (and only need to do once OR need to do every time you open file)
2-add the xml file to that same folder
3-run the code (result content should appear on "Sheet1" or XXX)

or Alternatively....something like:
1-create folder, add a new XLSM file with NEW code pasted into a NEW module, name that module "modDOMdocument"
2-navigate to the Developer tab > SOURCE > XML Maps > Add (add the xml), OK
3-while in Excel, open VBA window, TOOLS > REFERENCES > check box "Microsoft XML v6.0" (and only need to do once OR need to do every time you open file)
4-add the xml file to that same folder (or maybe this is not needed if step 2 is performed? - confused on this point)
5-run the code (result content should appear on "Sheet1" or XXX)

Well, I just tested both of the above with 2 brand new files and made SURE that:
=VBA References Microsoft XML 6.0 was check-marked just before running the code both times with both files.
(2 files: 1 did NOT have the XML added using the SOURCE window and the 2nd one DID have the xml added using the SOURCE window)
=The 2.5.1 file was in the same folder as the Excel XLSM file (and attempted to run it without adding the XML via the SOURCE window)
....When I attempted this, I got a new message "This workbook doesn't contain an XML schema" =-) nice (made it clear that with this code, I must ADD it) =-)

Alternatively,
.....When I attempted to use the 2nd Excel file that DID HAVE THE XML added using the SOURCE window, I got this error and here's the DEBUG highlighted area:
https://app.box.com/s/itq59kaykd7rxp9qr42reo17gvbzjxjk
 
Last edited:
Upvote 0
The statement you made: ["Because the new macro processes the internal schema you don't need the .xsd file."] kind of blows my mind..
Are you saying I should NOT have been going through the steps to ADD the xml via the SOURCE map window with the original code?
Correct, in a way. The original code (the Extract_XML_Document_Structure macro) reads the .xsd file from the local disk and parses it into the cells. It doesn't need the .xsd file to have been manually added via the XML Maps dialogue, however it doesn't matter if you have done this because the code specifically reads the file. The code would also work if you specified the URL location of the file instead of the local file name.

Or are you saying that this new code will work better if I have performed those steps to ADD the XML via the SOURCE map area?
The new code (the Extract_Internal_XML_Document_Structure macro) will only work if you have added the .xsd file to the workbook via the XML Maps dialogue. The macro itself doesn't read the .xsd file and therefore the .xsd file is not needed after you have added it.

Or are you saying that this new code has some special way of locating the xml info without looking to the SOURCE map window (in other words it will look at a specific file stored in the same folder as the XLSM file and will extract what it needs directly from the XSD file without the need of ADDING via SOURCE?)
The new code refers to the XML map stored inside the workbook, via this line:
Code:
    Set XMLinternalSchema = ThisWorkbook.XmlMaps(1).Schemas(1)
The code is not looking at a specific file stored in the same folder as the .xlsm workbook. With that line it is looking at the XML map stored inside the workbook, which you have previously added via the XML Maps dialogue. Excel's XML Source window shows the same internal map in a tree layout.

In summary, the new code requires the .xsd file to have been manually added to the workbook via the XML Maps dialogue. It is quite simple really: the original macro reads the .xsd file and the new macro reads the internal XML map/schema. I hope that helps your understanding of the two macros.

Alternatively,
.....When I attempted to use the 2nd Excel file that DID HAVE THE XML added using the SOURCE window, I got this error and here's the DEBUG highlighted area:
https://app.box.com/s/itq59kaykd7rxp9qr42reo17gvbzjxjk
The image is all green and almost unreadable. However, I think the error is because the code requires a sheet named "Sheet2" in the workbook.

You can either add or rename a sheet and name it "Sheet2", or change this line of code:
Code:
    With ThisWorkbook.Worksheets("Sheet2")
to:
Code:
    With ThisWorkbook.ActiveSheet
and then the code will write to whichever sheet is the active sheet.
 
Upvote 0

Forum statistics

Threads
1,223,908
Messages
6,175,306
Members
452,633
Latest member
DougMo

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top