Match cells between documents

laith93

New Member
Joined
Sep 10, 2022
Messages
5
Office Version
  1. 2010
Platform
  1. Windows
Hi All,
Firstly, I'm sorry for cross-posting, but please I want help for my problem

I have 2 excel files, 1 and 2 (see attached files)
In 1, there are multiple sheets, named according to group names for students, and in each sheet, there are student names and their group names (two columns: Name and Group).
97.png

In 2, there are student names and their scores in many subjects, but without their group name (two columns: Name and Scores).
98.png


Now, I want to add a third column to file 2 for group name based on data in file 1, copy paste group names will not make sense, because the names in file 2 are random.
So, I think it requires Match function or something else.
Please any fix or any other idea to fix this issue?
Thanks
 

Excel Facts

How can you automate Excel?
Press Alt+F11 from Windows Excel to open the Visual Basic for Applications (VBA) editor.
If this is cross-posted, could you please include a link to the cross-post (and also include a link to this one in the other post) so that anyone offering help has the benefit of knowing what has been contributed?
I have a few questions:
  1. In File1 you show a "Name" column with Test 1, Test 2, Test 3 and all are in Group A. In File2 you show a "Name" column with Test1, Test2, ... associated with a Score column. Is Test 1 (with a space between "Test" and "1") supposed to be associated with "Test1" (no space)? Are the actual names in your workbooks exact matches for each other?
  2. In the small image for File1 you show names for only Group A. What is the name of this worksheet? Is it "GroupA", or something similar? Are any of the names on this worksheet associated with any group other than A?
  3. What are the names of the worksheets in File1?
  4. Is each student associated with only one group?
  5. In File2, what do the scores represent?...are these actual test scores, and are the scores associated with a "test" or with a person's name? I don't understand how one could possibly match a "test" in File2 with a person's name based on what is shown...or perhaps the nomenclature chosen in confusing. Could you clarify this, please?
 
Upvote 0
If the "Name" columns in each file are intended to be exact matches for each other and those matches represent the only way to relate the data from each file, I would recommend consolidating all of the worksheets in File1 into a single table consisting of two or three columns (Name, Group, and perhaps the sheet name holding these data). This is done easily from a new 3rd file that accesses all of the worksheets in File1 using Power Query with the following M Code:
Power Query:
let
    Source = Excel.Workbook(File.Contents("C:\Users\...full path to file...\MrExcel_20220909_laith93_1.xlsx"), null, true),
    #"Removed Other Columns" = Table.SelectColumns(Source,{"Name", "Data"}),
    #"Expanded Data" = Table.ExpandTableColumn(#"Removed Other Columns", "Data", {"Column1", "Column2"}, {"Data.Column1", "Data.Column2"}),
    #"Promoted Headers" = Table.PromoteHeaders(#"Expanded Data", [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"GroupA", type text}, {"Name", type text}, {"Group", type text}}),
    #"Filtered Rows" = Table.SelectRows(#"Changed Type", each ([Name] <> "Name")),
    #"Renamed Columns" = Table.RenameColumns(#"Filtered Rows",{{"GroupA", "Sheet"}})
in
    #"Renamed Columns"
Then a new query is created in this 3rd file to pull in the information from File2, consisting of a two-column table (Name and Score) using this M Code:
Power Query:
let
    Source = Excel.Workbook(File.Contents("C:\Users\...full path to file...\MrExcel_20220909_laith93_2.xlsx"), null, true),
    #"Removed Other Columns" = Table.SelectColumns(Source,{"Name", "Data"}),
    #"Filtered Rows" = Table.SelectRows(#"Removed Other Columns", each ([Name] = "Name-Score")),
    #"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Name"}),
    #"Expanded Data" = Table.ExpandTableColumn(#"Removed Columns", "Data", {"Column1", "Column2"}, {"Column1", "Column2"}),
    #"Promoted Headers" = Table.PromoteHeaders(#"Expanded Data", [PromoteAllScalars=true])
in
    #"Promoted Headers"
Then a Merge Query operation is performed, again from Power Query in this 3rd file, to match the "Name" columns in the first two queries. This brings the corresponding scores for each name into the same table. The resulting 4-column table (Name, Score, Group, and Sheet Name) can then be loaded back to a worksheet in the 3rd file. Any subsequent changes to File1 or File2 will be reflected in the File3 output table by right-clicking in the output table and choosing "Refresh". The M Code for performing the merge that produces the final output table:
Power Query:
let
    Source = Table.NestedJoin(#"Name-Score", {"Name"}, File1All, {"Name"}, "File1All", JoinKind.LeftOuter),
    #"Expanded File1All" = Table.ExpandTableColumn(Source, "File1All", {"Sheet", "Group"}, {"Sheet", "Group"}),
    #"Reordered Columns" = Table.ReorderColumns(#"Expanded File1All",{"Name", "Score", "Group", "Sheet"})
in
    #"Reordered Columns"
Exiting Power Query with File>Close and Load To and then specifying a worksheet in File3 will produce an output similar to that below. This output table could be arranged/sorted as desired in Power Query before loading it back into the worksheet.
NameScoreGroupSheet
Test 1770BGroupB
Test 194AGroupA
Test 2373CGroupC
Test 294AGroupA
Test 571AGroupA
Test 390AGroupA

If your profile is correct (and you have Excel 2010?), you may need to investigate whether Power Query has been installed...or can be installed.
If you have Power Query installed and are interested in pursuing this approach, I can provide links to the three files I mocked up to demonstrate the approach.
 
Upvote 0
If this is cross-posted, could you please include a link to the cross-post (and also include a link to this one in the other post) so that anyone offering help has the benefit of knowing what has been contributed?
I have a few questions:
  1. In File1 you show a "Name" column with Test 1, Test 2, Test 3 and all are in Group A. In File2 you show a "Name" column with Test1, Test2, ... associated with a Score column. Is Test 1 (with a space between "Test" and "1") supposed to be associated with "Test1" (no space)? Are the actual names in your workbooks exact matches for each other?
  2. In the small image for File1 you show names for only Group A. What is the name of this worksheet? Is it "GroupA", or something similar? Are any of the names on this worksheet associated with any group other than A?
  3. What are the names of the worksheets in File1?
  4. Is each student associated with only one group?
  5. In File2, what do the scores represent?...are these actual test scores, and are the scores associated with a "test" or with a person's name? I don't understand how one could possibly match a "test" in File2 with a person's name based on what is shown...or perhaps the nomenclature chosen in confusing. Could you clarify this, please?
Thanks for reply
Thanks for your help
Thanks for your time,

Here is the link for cross-posting (I didn't get a reply)
Match cells between documents

Dear,
1:
In file 1, there are multiple sheets, each named according to group name like (A, B,.... etc), I attached a screen for sheet A only.
I have actual names, like Jhon Smith, Sary Hana.... etc, so I made mistake by writing names in file 2 without space (Test1), actually, the names contain space.
2,3,4:
The sheet names are A, B, C ... etc,
Names in each group are unique, so the names are not duplicated in any other group or sheet.
5:
The scores represent student scores for different exams and of course, each score is linked with the student's name
 
Upvote 0
If the "Name" columns in each file are intended to be exact matches for each other and those matches represent the only way to relate the data from each file, I would recommend consolidating all of the worksheets in File1 into a single table consisting of two or three columns (Name, Group, and perhaps the sheet name holding these data). This is done easily from a new 3rd file that accesses all of the worksheets in File1 using Power Query with the following M Code:
Power Query:
let
    Source = Excel.Workbook(File.Contents("C:\Users\...full path to file...\MrExcel_20220909_laith93_1.xlsx"), null, true),
    #"Removed Other Columns" = Table.SelectColumns(Source,{"Name", "Data"}),
    #"Expanded Data" = Table.ExpandTableColumn(#"Removed Other Columns", "Data", {"Column1", "Column2"}, {"Data.Column1", "Data.Column2"}),
    #"Promoted Headers" = Table.PromoteHeaders(#"Expanded Data", [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"GroupA", type text}, {"Name", type text}, {"Group", type text}}),
    #"Filtered Rows" = Table.SelectRows(#"Changed Type", each ([Name] <> "Name")),
    #"Renamed Columns" = Table.RenameColumns(#"Filtered Rows",{{"GroupA", "Sheet"}})
in
    #"Renamed Columns"
Then a new query is created in this 3rd file to pull in the information from File2, consisting of a two-column table (Name and Score) using this M Code:
Power Query:
let
    Source = Excel.Workbook(File.Contents("C:\Users\...full path to file...\MrExcel_20220909_laith93_2.xlsx"), null, true),
    #"Removed Other Columns" = Table.SelectColumns(Source,{"Name", "Data"}),
    #"Filtered Rows" = Table.SelectRows(#"Removed Other Columns", each ([Name] = "Name-Score")),
    #"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Name"}),
    #"Expanded Data" = Table.ExpandTableColumn(#"Removed Columns", "Data", {"Column1", "Column2"}, {"Column1", "Column2"}),
    #"Promoted Headers" = Table.PromoteHeaders(#"Expanded Data", [PromoteAllScalars=true])
in
    #"Promoted Headers"
Then a Merge Query operation is performed, again from Power Query in this 3rd file, to match the "Name" columns in the first two queries. This brings the corresponding scores for each name into the same table. The resulting 4-column table (Name, Score, Group, and Sheet Name) can then be loaded back to a worksheet in the 3rd file. Any subsequent changes to File1 or File2 will be reflected in the File3 output table by right-clicking in the output table and choosing "Refresh". The M Code for performing the merge that produces the final output table:
Power Query:
let
    Source = Table.NestedJoin(#"Name-Score", {"Name"}, File1All, {"Name"}, "File1All", JoinKind.LeftOuter),
    #"Expanded File1All" = Table.ExpandTableColumn(Source, "File1All", {"Sheet", "Group"}, {"Sheet", "Group"}),
    #"Reordered Columns" = Table.ReorderColumns(#"Expanded File1All",{"Name", "Score", "Group", "Sheet"})
in
    #"Reordered Columns"
Exiting Power Query with File>Close and Load To and then specifying a worksheet in File3 will produce an output similar to that below. This output table could be arranged/sorted as desired in Power Query before loading it back into the worksheet.
NameScoreGroupSheet
Test 1770BGroupB
Test 194AGroupA
Test 2373CGroupC
Test 294AGroupA
Test 571AGroupA
Test 390AGroupA

If your profile is correct (and you have Excel 2010?), you may need to investigate whether Power Query has been installed...or can be installed.
If you have Power Query installed and are interested in pursuing this approach, I can provide links to the three files I mocked up to demonstrate the approach.
Wow, I think you have got the idea,
Really, the original file 1 contains multiple sheets, in each sheet, there is a column for student's name only, but I added Group column just for identification (when using a function). However, if you think this column is not necessary, you can ignore it.
As you said, only Name column represents the only way to relate the data from each file.

Actually, I'm not used PowerQuery before, So I will googled it
and for my office ( I have 2 PCs, i3 with office 2010 (more used), i7 with office 2019).

Thanks thanks thanks for these info
I appreciate you help and time.
 
Upvote 0
Here are links to the three files, shown in order for file 1 with the names and groups, file 2 with the names and scores, and file 3 with the Power Query code and a worksheet that receives the results table output by PQ.
I would recommend trying out this set of three files first. Place all of them in the same folder and note the full path to the files (probably something like "C:\Users\...\full file name"). Then open the 3rd file and explore which Queries and Connections are found in the workbook (Data > Queries & Connections...and you should see a side window display three queries/connections named "File1All", "Name-Score", and "Merge1"). The File1All query builds a consolidated table of all worksheets found in the 1st file. This consolidated table is viewable in Power Query, but it is not loaded into a worksheet where it can be seen. This is why it is called a "Connection only" query--you can connect to this virtual table, but you can't directly see it on a worksheet. Similarly, the Name-Score query builds a connection only query of the names and scores on the worksheet in the 2nd file. The M code for each of these will need to be modified so that it knows where the source workbooks are on your machine. To do that, it might be easiest to directly edit the M code using the Power Query editor.

If you hover the mouse cursor over "File1All connection only" in the side window (or right click on it), a new popup window will display a snapshot of what the resulting table looks like. At the bottom of that popup window you will see Edit. Click on Edit to open Power Query. Then click on View > Advanced Editor and edit the first line of M Code, replacing my path ("C:\Users\kirkr...") with your path (leave the file name unchanged for now). Then click Done to close out the editing window and click File > Close and Load, which will execute the code and return you to the Excel worksheet. Repeat this process for the Name-Score connection only query. The Merge1 query should not require editing, as it picks up from the Name-Score query and performs a merge operation with the File1All query (and the names of those queries have not been changed). That's all. You should be able to add worksheets, edit student names or scores in the other two workbooks and save them. You can then close those two files. Then in file 3, right click on the summary table and click "Refresh" (or choose Data > Refresh from the top level menu) and the table should update.

After you have this working on your system, you can either move your actual data into these files or the following (which is what I would do): 1) confirm that your file 1 and file 2 have the same structure as mine and then 2) simply edit the M Code (as described above) by changing the name of the Source files.
 
Upvote 0
Solution
Here are links to the three files, shown in order for file 1 with the names and groups, file 2 with the names and scores, and file 3 with the Power Query code and a worksheet that receives the results table output by PQ.
I would recommend trying out this set of three files first. Place all of them in the same folder and note the full path to the files (probably something like "C:\Users\...\full file name"). Then open the 3rd file and explore which Queries and Connections are found in the workbook (Data > Queries & Connections...and you should see a side window display three queries/connections named "File1All", "Name-Score", and "Merge1"). The File1All query builds a consolidated table of all worksheets found in the 1st file. This consolidated table is viewable in Power Query, but it is not loaded into a worksheet where it can be seen. This is why it is called a "Connection only" query--you can connect to this virtual table, but you can't directly see it on a worksheet. Similarly, the Name-Score query builds a connection only query of the names and scores on the worksheet in the 2nd file. The M code for each of these will need to be modified so that it knows where the source workbooks are on your machine. To do that, it might be easiest to directly edit the M code using the Power Query editor.

If you hover the mouse cursor over "File1All connection only" in the side window (or right click on it), a new popup window will display a snapshot of what the resulting table looks like. At the bottom of that popup window you will see Edit. Click on Edit to open Power Query. Then click on View > Advanced Editor and edit the first line of M Code, replacing my path ("C:\Users\kirkr...") with your path (leave the file name unchanged for now). Then click Done to close out the editing window and click File > Close and Load, which will execute the code and return you to the Excel worksheet. Repeat this process for the Name-Score connection only query. The Merge1 query should not require editing, as it picks up from the Name-Score query and performs a merge operation with the File1All query (and the names of those queries have not been changed). That's all. You should be able to add worksheets, edit student names or scores in the other two workbooks and save them. You can then close those two files. Then in file 3, right click on the summary table and click "Refresh" (or choose Data > Refresh from the top level menu) and the table should update.

After you have this working on your system, you can either move your actual data into these files or the following (which is what I would do): 1) confirm that your file 1 and file 2 have the same structure as mine and then 2) simply edit the M Code (as described above) by changing the name of the Source files.
Thanks, thanks thanks thanks thanks
I can't thank you enough
I appreciate your time and efforts
Really, I googled and watched some Youtube videos and that is ok
My problem was solved and everything is ok.

As your recommendation,
" I would recommend consolidating all of the worksheets in File1 into a single table consisting of two or three columns (Name, Group, and perhaps the sheet name holding these data)."
I made a file with one sheet containing the above info (Name, Group ...etc), but in your file 1, there are multiple sheets with the group name
the question is how you get the data in multiple sheets in a single table in a power query named "File1All", please I want to learn this technique in steps, not by code.

Just a last question for confirmation,
if the number of records in file 1 does not equal the number of records in file 2, is this affect the results?

Thank you again
Best Regards
 
Upvote 0
Yes, based on your initial description:
In 1, there are multiple sheets, named according to group names for students, and in each sheet, there are student names and their group names (two columns: Name and Group).
...my understanding is that you have multiple sheets in File1 and each of those sheets are named for the "group". The best tutorial I've seen for doing this is by Chandoo:
The method involves beginning with a blank workbook and creating a query that obtains data from File1. As you follow the steps in the video, you will see how this makes all of the sheets in File1 available. You can then filter the list of sheets to eliminate any that should not be included. The initial steps taken essentially create a table where each Data cell contains another table (and that table represents everything on the source worksheet). As you click through the steps using the menu option in Power Query, the M Code is being built behind the scenes. You can review the code using View > Advanced Editor, or examine the code by looking at each individual step's code in the formula bar. But this is probably the best way to learn the technique step-by-step.

I used a very similar method for creating the Name-Score query from File2. I noticed that even though there is only one worksheet, the original Source steps reveals a hidden table (I'm not sure why), so I filtered it out using the drop-down arrow in the Name column heading...to keep only the "Name-Score" worksheet. The remaining steps are the same as the first query.

Then for the 3rd major step, while you are at the last step of the Name-Score query, and you see the final results obtained from File2 in Power Query, click on Home > Merge Queries > Merge Queries as New. This opens up a dialog window where the Name-Score query is shown first. Then select File1All as the other query whose information you want to bring over to the corresponding rows in the Name-Score query. In order to perform this matching, you need to indicate which columns are used for matching, so click on the "Name" column for each query and indicate that you want a "Left Outer" join. That will take all of the items in the 1st query (Name-Score) and find the corresponding match in the 2nd query (File1All). That will give you another table whose cells contain small tables...and this needs to be cleaned up. To clean up these results, you click on the Expand symbol (the icon with a left & right arrow) in the File1All column heading, and a window appears showing which columns are in the mini-table. Here you will want to select only the Sheet and the Group, and do not use the option for keeping the original column name as a prefix. That's the last step...you can drag the column headings to present them in a different order and sort the results if desired before clicking on Home > Close & Load...which sends the final table back to the Excel worksheet.

These steps produce three queries...the first two are tables representing information in File1 and File2, and the last represents the merged results.
 
Upvote 0
Yes, based on your initial description:

...my understanding is that you have multiple sheets in File1 and each of those sheets are named for the "group". The best tutorial I've seen for doing this is by Chandoo:
The method involves beginning with a blank workbook and creating a query that obtains data from File1. As you follow the steps in the video, you will see how this makes all of the sheets in File1 available. You can then filter the list of sheets to eliminate any that should not be included. The initial steps taken essentially create a table where each Data cell contains another table (and that table represents everything on the source worksheet). As you click through the steps using the menu option in Power Query, the M Code is being built behind the scenes. You can review the code using View > Advanced Editor, or examine the code by looking at each individual step's code in the formula bar. But this is probably the best way to learn the technique step-by-step.

I used a very similar method for creating the Name-Score query from File2. I noticed that even though there is only one worksheet, the original Source steps reveals a hidden table (I'm not sure why), so I filtered it out using the drop-down arrow in the Name column heading...to keep only the "Name-Score" worksheet. The remaining steps are the same as the first query.

Then for the 3rd major step, while you are at the last step of the Name-Score query, and you see the final results obtained from File2 in Power Query, click on Home > Merge Queries > Merge Queries as New. This opens up a dialog window where the Name-Score query is shown first. Then select File1All as the other query whose information you want to bring over to the corresponding rows in the Name-Score query. In order to perform this matching, you need to indicate which columns are used for matching, so click on the "Name" column for each query and indicate that you want a "Left Outer" join. That will take all of the items in the 1st query (Name-Score) and find the corresponding match in the 2nd query (File1All). That will give you another table whose cells contain small tables...and this needs to be cleaned up. To clean up these results, you click on the Expand symbol (the icon with a left & right arrow) in the File1All column heading, and a window appears showing which columns are in the mini-table. Here you will want to select only the Sheet and the Group, and do not use the option for keeping the original column name as a prefix. That's the last step...you can drag the column headings to present them in a different order and sort the results if desired before clicking on Home > Close & Load...which sends the final table back to the Excel worksheet.

These steps produce three queries...the first two are tables representing information in File1 and File2, and the last represents the merged results.
Thanks, thanks thanks
I can't thank you enough
Billion thank you

Sorry for annoying you, please an answer,
if the number of records in file 1 does not equal the number of records in file 2, is this affect the results?

Thank you again
Best Regards
 
Upvote 0
The results could be affected, depending on the mismatch. But that behavior can be modified by changing how the two tables are merged. Currently, we start with the File2 table and use all rows in it and attempt to find matching records in File1. This has the potential to miss some matches. If you prefer, the merge could consider all rows from the tables produced by the File1 and File2 workbooks (i.e., the File1All and Name-Score queries, respectively). If you want this, there would be potential for a mismatch in both directions (either a Name-Group exists with no score, or a Name-Score exists with no group). Is this what you want.
 
Upvote 0

Forum statistics

Threads
1,224,818
Messages
6,181,152
Members
453,021
Latest member
Justyna P

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top