Power Query column alignment

jenksdev

New Member
Joined
Sep 14, 2019
Messages
10
Hello.

First time here, I hope I'm posting this in the right area.

Please note that while the linked document contains names and addresses, it from a public database and is therefore not a breach of any kind of data protection (United Kingdom's company database from the official government website).

Anyway, what I am struggling with is the inconsistent data output. In the attached file, taking row 1353 as an example, the data shows 'country_of_residence' in the column headed 'premises'.

What I am trying to figure out is how to create a set of steps / queries in order to correctly align the data based on the column headers, as the headers will not change. Once the data is properly aligned, I will then cleanse it and remove all speech marks etc - the easy bit.

I've toyed with it, and can filter out the records that do not match the column headers, but I cannot figure out how to reassign them to the correct column.

Please could someone offer an insight in to how this is possible?

Thanks,

Ben

File is downloadable from Google Drive (in .xlsx format) here: https://drive.google.com/open?id=1JMtxILfIvSCDfoTboqEx9kXAP8QofK3T
 

Hi Peter,

It's from here: http://download.companieshouse.gov.uk/en_pscdata.html - PSC data as one file

It's a 5GB file though, and it downloads as a .txt

Thank you for your help on this one!

Ben
 
Upvote 0

Excel Facts

What is the shortcut key for Format Selection?
Ctrl+1 (the number one) will open the Format dialog for whatever is selected.


This is almost exactly what I am looking for, yes!

I would need the company number, and business address though, concatenated from the address_line_1, address_line_2, locality etc.

Thanks,

Ben
 
Upvote 0
this is 868 MB not 5 GB :)

use header names what you want split/concatenate

and don't quote whole post, please!
 
Last edited:
Upvote 0
Sorry, the only option it was giving me was to reply with whole quote.

Would you mind explaining how you got to that end result?

Thanks,

Ben
 
Upvote 0
use Reply not Reply With Quote :)

screenshot-106.png


your previous source is incorrect because there is no company number, address lin1, etc... , I am trying with original file from the link

but you can do that yourself with some splits, replaces and Unpivot and Pivot
 
Last edited:
Upvote 0
it takes some time with ca. 5 GB of data (sorry, I didn't check unzipped file size :))
 
Upvote 0
you can try this

Code:
[SIZE=1]let
    Source = Table.FromColumns({Lines.FromBinary(File.Contents("[B][COLOR="#FF0000"]D:\test\persons-with-significant-control-snapshot-2019-09-15\persons-with-significant-control-snapshot-2019-09-15.txt[/COLOR][/B]"), null, null, 1252)}),
    RV1 = Table.ReplaceValue(Source,"percent"",""voting","percent_voting",Replacer.ReplaceText,{"Column1"}),
    Split1 = Table.SplitColumn(RV1, "Column1", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), {"Column1.1", "Column1.2", "Column1.3", "Column1.4", "Column1.5", "Column1.6", "Column1.7", "Column1.8", "Column1.9", "Column1.10", "Column1.11", "Column1.12", "Column1.13", "Column1.14", "Column1.15", "Column1.16", "Column1.17", "Column1.18", "Column1.19", "Column1.20", "Column1.21", "Column1.22", "Column1.23", "Column1.24", "Column1.25", "Column1.26", "Column1.27", "Column1.28", "Column1.29"}),
    RV2 = Table.ReplaceValue(Split1,":{","_",Replacer.ReplaceText,{"Column1.1", "Column1.2", "Column1.3", "Column1.4", "Column1.5", "Column1.6", "Column1.7", "Column1.8", "Column1.9", "Column1.10", "Column1.11", "Column1.12", "Column1.13", "Column1.14", "Column1.15", "Column1.16", "Column1.17", "Column1.18", "Column1.19", "Column1.20", "Column1.21", "Column1.22", "Column1.23", "Column1.24", "Column1.25", "Column1.26", "Column1.27", "Column1.28", "Column1.29"}),
    RV3 = Table.ReplaceValue(RV2,"}","",Replacer.ReplaceText,{"Column1.1", "Column1.2", "Column1.3", "Column1.4", "Column1.5", "Column1.6", "Column1.7", "Column1.8", "Column1.9", "Column1.10", "Column1.11", "Column1.12", "Column1.13", "Column1.14", "Column1.15", "Column1.16", "Column1.17", "Column1.18", "Column1.19", "Column1.20", "Column1.21", "Column1.22", "Column1.23", "Column1.24", "Column1.25", "Column1.26", "Column1.27", "Column1.28", "Column1.29"}),
    RV4 = Table.ReplaceValue(RV3,"{","",Replacer.ReplaceText,{"Column1.1", "Column1.2", "Column1.3", "Column1.4", "Column1.5", "Column1.6", "Column1.7", "Column1.8", "Column1.9", "Column1.10", "Column1.11", "Column1.12", "Column1.13", "Column1.14", "Column1.15", "Column1.16", "Column1.17", "Column1.18", "Column1.19", "Column1.20", "Column1.21", "Column1.22", "Column1.23", "Column1.24", "Column1.25", "Column1.26", "Column1.27", "Column1.28", "Column1.29"}),
    Index = Table.AddIndexColumn(RV4, "Index", 1, 1),
    UnpivotOC = Table.UnpivotOtherColumns(Index, {"Index"}, "Attribute", "Value"),
    RC = Table.RemoveColumns(UnpivotOC,{"Attribute"}),
    Split2 = Table.SplitColumn(RC, "Value", Splitter.SplitTextByEachDelimiter({":"}, QuoteStyle.Csv, false), {"Value.1", "Value.2"}),
    Pivot = Table.Pivot(Split2, List.Distinct(Split2[Value.1]), "Value.1", "Value.2")
in
    Pivot[/SIZE]

change path (red line) to your source file

I suggest to use parts of this file from url you posted because whole source file is too big to see result in short time
 
Upvote 0
Ben,
After a lot of messing about because I knew nothing about JSON files I came up with this:
Code:
let
    Source = Table.FromColumns({Lines.FromBinary(File.Contents("C:\Users\Peter\Documents\PowerQuery\persons-with-significant-control-snapshot-2019-09-15.txt"), null, null, 1252)}),
    #"Parsed JSON" = Table.TransformColumns(Source,{},Json.Document),
    #"Expanded Column1" = Table.ExpandRecordColumn(#"Parsed JSON", "Column1", {"company_number", "data"}, {"company_number", "data"}),
    #"Expanded data" = Table.ExpandRecordColumn(#"Expanded Column1", "data", {"address", "ceased_on", "country_of_residence", "date_of_birth", "etag", "kind", "links", "name", "name_elements", "nationality", "natures_of_control", "notified_on"}, {"address", "ceased_on", "country_of_residence", "date_of_birth", "etag", "kind", "links", "name", "name_elements", "nationality", "natures_of_control", "notified_on"}),
    #"Expanded date_of_birth" = Table.ExpandRecordColumn(#"Expanded data", "date_of_birth", {"month", "year"}, {"month", "year"}),
    #"Expanded links" = Table.ExpandRecordColumn(#"Expanded date_of_birth", "links", {"self"}, {"self"}),
    #"Expanded name_elements" = Table.ExpandRecordColumn(#"Expanded links", "name_elements", {"forename", "middle_name", "surname", "title"}, {"forename", "middle_name", "surname", "title"}),
    #"Extracted Values" = Table.TransformColumns(#"Expanded name_elements", {"natures_of_control", each Text.Combine(List.Transform(_, Text.From)), type text}),
    #"Expanded address" = Table.ExpandRecordColumn(#"Extracted Values", "address", {"address_line_1", "country", "locality", "postal_code", "premises", "address_line_2", "region"}, {"address_line_1", "country", "locality", "postal_code", "premises", "address_line_2", "region"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Expanded address",{{"company_number", type text}, {"address_line_1", type text}, {"country", type text}, {"locality", type text}, {"postal_code", type text}, {"premises", type text}, {"address_line_2", type text}, {"region", type text}, {"ceased_on", type date}, {"country_of_residence", type text}, {"month", Int64.Type}, {"year", Int64.Type}, {"etag", type text}, {"kind", type text}, {"self", type text}, {"name", type text}, {"forename", type text}, {"middle_name", type text}, {"surname", type text}, {"title", type text}, {"nationality", type text}, {"natures_of_control", type text}, {"notified_on", type date}})
in
    #"Changed Type"

hope it works for you.
Peter
 
Upvote 0

Forum statistics

Threads
1,226,529
Messages
6,191,587
Members
453,666
Latest member
madelineharris

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top