Access query expression builder is creating duplicate rows in query, please help.

Maggie Barr

Board Regular
Joined
Jan 28, 2014
Messages
188
Greetings and thank you in advance if you can help.
I am working on a PC using Office Pro 2019 desktop version.

I am dealing with some bird data, and I have a query that has all of the Atlas blocks for the state. I am trying to bring data in from another query with regard to the amount of birding effort in each block. The tricky part is the effort query has all of the blocks, with one column for time period (there are two periods, period 1 & Period 2), so there may be two rows of data for each block in the effort query. I can't just join the queries and transpose the time period field to columns while keeping the effort reported for each that is associated with it that I know of.

In the Block Query, I tried to create two columns, one for period 1 and one for period 2. I tried to use a formula in the expression builder to bring in the effort reported for the block for the time period the column represents, but when I did that, it just created a duplicate row for the block reported.

My two formulas are:
Period_1 Effort: IIf([Atlas_Winter_Blocks_ArcMap]![BlockCode]=[Effort]![BlockCode] And [Effort]![WINTER_TIME_PERIOD]="Period_1",[Effort]![SumOfMaxOfFINAL_DURATION_HOURS],0)

Period_2 Effort: IIf([Atlas_Winter_Blocks_ArcMap]![BlockCode]=[Effort]![BlockCode] And [Effort]![WINTER_TIME_PERIOD]="Period_2",[Effort]![SumOfMaxOfFINAL_DURATION_HOURS],0)

I ran the query with both those formulas, and it created the duplicate rows. The first query went from 4,246 rows to 5,463 rows. Not all blocks have effort to report, so there aren't duplicates in those.

Does anyone have any thoughts on how I might get this formula to work without adding rows to the original query?

I ran another query off of that one and applied "totals" (grouping) to the query in design view, with period 1 & period 2 set to sum to compress the dataset and remove duplicates, but that just seems like a very gaumy inefficient way to get this done.

Thank you again if you can help.
Best Wishes,
Maggie Barr
 

Excel Facts

Best way to learn Power Query?
Read M is for (Data) Monkey book by Ken Puls and Miguel Escobar. It is the complete guide to Power Query.
When you get unintended duplicate records, it is usually a sign that you have not joined the tables/query correctly, and/or you have some database design issues.

Can you please post a small sampling of the data tables/queries you are trying to query on, and then post the SQL code of the query you created (just switch to SQL View and copy and paste the code here)?
 
Upvote 0
When you get unintended duplicate records, it is usually a sign that you have not joined the tables/query correctly, and/or you have some database design issues.

Can you please post a small sampling of the data tables/queries you are trying to query on, and then post the SQL code of the query you created (just switch to SQL View and copy and paste the code here)?
Joe4,
Thanks for reaching out to help.
My Block file Query looks like this but with 4,246 rows, a single row for each BlockCode encompassing all blocks in the state.
ModifiedBlockTypeCoordRegioBlockCode
RegularKennebago LakeSkinner NE_SW
RegularKennebago LakeSkinner NE_SE
PriorityJackmanJackman_SW

My Effort File looks like this, but has 3,089 rows of data, with some blocks having two rows because there was effort for two time periods. As well, some blocks aren't in the file because they don't have any effort data. As you can see for instance Abol Pond NE is in there with two rows because there is time for both time periods. This query comes from crunching down a large dataset of observation records with an observation date (the full file is roughly 3.5 mill, but the winter data observations constitute over 580,000 records). I convert that date to a time period based on a lookup table, so from the start, each observation is identified as period 1 or period 2, I can't create those as columns. Then I run the data through quality parameters to filter it further and summarize the results.
BlockCodeWINTER_TIME_PERIODSumOfMaxOfFINAL_DURATION_HOURS
Abol Pond_CEPeriod_20.03333
Abol Pond_NEPeriod_10.01667
Abol Pond_NEPeriod_20.03334
Abol Pond_NWPeriod_29.50001
Addison_CEPeriod_12.78334
Addison_CEPeriod_20.36667
Addison_CWPeriod_20.13333

What the output of my query provided was a dataset with new rows duplicating the blocks sometimes as it brought in the effort. I joined the Atlas Block table to the effort using BlockCode, relationship type 2, but I never brought in any data from the effort by dragging it in because I can't turn Winter Time Period into two columns of Period_1 and Period_2 from the one column. So I thought I would try to create columns in the query to bring the data in based on the matching of BlockCode between the queries and using AND in the formula to only get Period_1 or Period_2 effort See Formulas in first post).
You can see that, though the base query only has one row for each block, after running the formulas, it created two rows, like for Abol Pond NE. Listing one row for period 1 effort and one row for period 2 effort.
ModifiedBlockTypeCoordRegioBlockCodePeriod_1 EffortPeriod_2 Effort
RegularHoultonAbol Pond_CE00.03333
RegularHoultonAbol Pond_CW00
RegularHoultonAbol Pond_NE0.016670
RegularHoultonAbol Pond_NE00.03334
PriorityHoultonAbol Pond_NW09.50001
RegularHoultonAbol Pond_SE00
RegularHoultonAbol Pond_SW00
RegularColumbia FallsAddison_CE2.783340
RegularColumbia FallsAddison_CE00.36667
PriorityColumbia FallsAddison_CW00.13333

The SQL of the query that produced the table above, which does not include the grouping I did to grou them after this is:

SELECT Atlas_Winter_Blocks_ArcMap.[All Maine Regions], Atlas_Winter_Blocks_ArcMap.[All Maine Regions; ModifiedBlockType], Atlas_Winter_Blocks_ArcMap.ModifiedBlockType, Atlas_Winter_Blocks_ArcMap.CoordRegio, Atlas_Winter_Blocks_ArcMap.[CoordRegio; ModifiedBlockType], Atlas_Winter_Blocks_ArcMap.BlockCode, IIf([Atlas_Winter_Blocks_ArcMap]![BlockCode]=[Final Grouping 2 Query]![BlockCode] And [Final Grouping 2 Query]![WINTER_TIME_PERIOD]="Period_1",[Final Grouping 2 Query]![SumOfMaxOfFINAL_DURATION_HOURS],0) AS [Period_1 Effort], IIf([Atlas_Winter_Blocks_ArcMap]![BlockCode]=[Final Grouping 2 Query]![BlockCode] And [Final Grouping 2 Query]![WINTER_TIME_PERIOD]="Period_2",[Final Grouping 2 Query]![SumOfMaxOfFINAL_DURATION_HOURS],0) AS [Period_2 Effort]
FROM Atlas_Winter_Blocks_ArcMap LEFT JOIN [Final Grouping 2 Query] ON Atlas_Winter_Blocks_ArcMap.BlockCode = [Final Grouping 2 Query].BlockCode;

What I need my query of the base block file to look like after bringing in the data from the effort query, and what I have after my convoluted approach using grouping after the previous query is:
ModifiedBlockTypeCoordRegioBlockCodePeriod_1 EffortPeriod_2 Effort
PriorityAuroraAlligator Lake_NW4.233333.11666
PriorityAuroraAmherst_NW6.616693.633317
PriorityAuroraBottle Lake_NW10
PriorityAuroraBrandy Pond_CE00
PriorityAuroraBurlington_NW00
PriorityAuroraDuck Lake_NW00
PriorityAuroraGassabias Lake_NW00
PriorityAuroraGreat Pond_NW5.233334.56667
PriorityAuroraGreenfield_NW00
PriorityAuroraHopkins Pond_NW3.23.19967
PriorityAuroraLead Mtn_NW01.283
PriorityAuroraLead Mtn_SE00.533
PriorityAuroraLee_NW00
PriorityAuroraLincoln East_NW5.033340.516667
PriorityAuroraQuillpig Mtn_NW00
PriorityAuroraRocky Pond_NW3.416663.95

Hopefully this helps somewhat with understanding what is going on.
Than you again for reaching out!
Maggie
 
Upvote 0
OK, so it appears that everything is in order, and the issue (not really an "issue", but more of the "situation" due to the data structure, which is fine) is that you are dealing with a one-to-many relationship between the two tables.The "BlockCode" field that you are joining on is unique in the one table, but can be duplicated in the other table. So naturally, the query between those two tables will result in duplicate BlockCodes since they are already duplicated in your second table.

How to deal with that? Exactly as you have described here:
I ran another query off of that one and applied "totals" (grouping) to the query in design view, with period 1 & period 2 set to sum to compress the dataset and remove duplicates, but that just seems like a very gaumy inefficient way to get this done.
There is nothing wrong with this method - that is exactly how you deal with relationships like this.
You will find that Aggregate Queries are one of the most powerful tools you use in database queries, and are your good friend!
 
Upvote 0
OK, so it appears that everything is in order, and the issue (not really an "issue", but more of the "situation" due to the data structure, which is fine) is that you are dealing with a one-to-many relationship between the two tables.The "BlockCode" field that you are joining on is unique in the one table, but can be duplicated in the other table. So naturally, the query between those two tables will result in duplicate BlockCodes since they are already duplicated in your second table.

How to deal with that? Exactly as you have described here:

There is nothing wrong with this method - that is exactly how you deal with relationships like this.
You will find that Aggregate Queries are one of the most powerful tools you use in database queries, and are your good friend!
Joe4,
AWESOME!!! Thank you! I knew that structurally the data does this, as I have dealt with it in Excel Power Query with the same issue and had to perform essentially the same thing. I had just hoped that there was some way in the formula to force it to "insert" the data for the row, thus not creating a duplicate row. I am glad to hear that there is nothing wrong with my method, actually super psyched as I don't consider myself to be "trained" in any of this. I use the "totals" aka grouping a ton in my data for many manipulations. My files are quite large and I have multiple Access databases to process the data because of file size caps of the program (8 in entirety for just the breeding data, then I have the winter data), so I always try to ensure I am doing things as efficiently as possible to keep memory use and file size as small as possible.
Thank you so much for reaching out to help and letting me know I had in fact done what needed to be done appropriately and as efficiently as possible.
Best Wishes,
Maggie Barr
 
Upvote 0
You are welcome.
Yes, the query itself is really not creating any duplicates, it is just mirroring the duplicates that already exist in the second table.
Aggregate queries can be used to group records to "compress" that data down to one file.

Sometimes you will see duplicate records be created when there are not and duplicates in the underlying table. This often happens when the two tables need to be joined by multiple fields, but the person has only joined on one. A simple example of this is if you were joining on names, and there were separate fields for first and last name. If you just joined on last names, that is not enough, because multiple people could have the same last name, so it may create unintended duplicates.
 
Upvote 0

Forum statistics

Threads
1,225,739
Messages
6,186,746
Members
453,370
Latest member
juliewar

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top