Access query expression builder is creating duplicate rows in query, please help.

Maggie Barr · Apr 13, 2021

Greetings and thank you in advance if you can help.
I am working on a PC using Office Pro 2019 desktop version.

I am dealing with some bird data, and I have a query that has all of the Atlas blocks for the state. I am trying to bring data in from another query with regard to the amount of birding effort in each block. The tricky part is the effort query has all of the blocks, with one column for time period (there are two periods, period 1 & Period 2), so there may be two rows of data for each block in the effort query. I can't just join the queries and transpose the time period field to columns while keeping the effort reported for each that is associated with it that I know of.

In the Block Query, I tried to create two columns, one for period 1 and one for period 2. I tried to use a formula in the expression builder to bring in the effort reported for the block for the time period the column represents, but when I did that, it just created a duplicate row for the block reported.

My two formulas are:
Period_1 Effort: IIf([Atlas_Winter_Blocks_ArcMap]![BlockCode]=[Effort]![BlockCode] And [Effort]![WINTER_TIME_PERIOD]="Period_1",[Effort]![SumOfMaxOfFINAL_DURATION_HOURS],0)

Period_2 Effort: IIf([Atlas_Winter_Blocks_ArcMap]![BlockCode]=[Effort]![BlockCode] And [Effort]![WINTER_TIME_PERIOD]="Period_2",[Effort]![SumOfMaxOfFINAL_DURATION_HOURS],0)

I ran the query with both those formulas, and it created the duplicate rows. The first query went from 4,246 rows to 5,463 rows. Not all blocks have effort to report, so there aren't duplicates in those.

Does anyone have any thoughts on how I might get this formula to work without adding rows to the original query?

I ran another query off of that one and applied "totals" (grouping) to the query in design view, with period 1 & period 2 set to sum to compress the dataset and remove duplicates, but that just seems like a very gaumy inefficient way to get this done.

Thank you again if you can help.
Best Wishes,
Maggie Barr

Joe4 · Apr 13, 2021

When you get unintended duplicate records, it is usually a sign that you have not joined the tables/query correctly, and/or you have some database design issues.

Can you please post a small sampling of the data tables/queries you are trying to query on, and then post the SQL code of the query you created (just switch to SQL View and copy and paste the code here)?

Maggie Barr · Apr 13, 2021

Joe4 said:
When you get unintended duplicate records, it is usually a sign that you have not joined the tables/query correctly, and/or you have some database design issues.

Can you please post a small sampling of the data tables/queries you are trying to query on, and then post the SQL code of the query you created (just switch to SQL View and copy and paste the code here)?

Joe4,
Thanks for reaching out to help.
My Block file Query looks like this but with 4,246 rows, a single row for each BlockCode encompassing all blocks in the state.

ModifiedBlockType	CoordRegio	BlockCode
Regular	Kennebago Lake	Skinner NE_SW
Regular	Kennebago Lake	Skinner NE_SE
Priority	Jackman	Jackman_SW

My Effort File looks like this, but has 3,089 rows of data, with some blocks having two rows because there was effort for two time periods. As well, some blocks aren't in the file because they don't have any effort data. As you can see for instance Abol Pond NE is in there with two rows because there is time for both time periods. This query comes from crunching down a large dataset of observation records with an observation date (the full file is roughly 3.5 mill, but the winter data observations constitute over 580,000 records). I convert that date to a time period based on a lookup table, so from the start, each observation is identified as period 1 or period 2, I can't create those as columns. Then I run the data through quality parameters to filter it further and summarize the results.

BlockCode	WINTER_TIME_PERIOD	SumOfMaxOfFINAL_DURATION_HOURS
Abol Pond_CE	Period_2	0.03333
Abol Pond_NE	Period_1	0.01667
Abol Pond_NE	Period_2	0.03334
Abol Pond_NW	Period_2	9.50001
Addison_CE	Period_1	2.78334
Addison_CE	Period_2	0.36667
Addison_CW	Period_2	0.13333

What the output of my query provided was a dataset with new rows duplicating the blocks sometimes as it brought in the effort. I joined the Atlas Block table to the effort using BlockCode, relationship type 2, but I never brought in any data from the effort by dragging it in because I can't turn Winter Time Period into two columns of Period_1 and Period_2 from the one column. So I thought I would try to create columns in the query to bring the data in based on the matching of BlockCode between the queries and using AND in the formula to only get Period_1 or Period_2 effort See Formulas in first post).
You can see that, though the base query only has one row for each block, after running the formulas, it created two rows, like for Abol Pond NE. Listing one row for period 1 effort and one row for period 2 effort.

ModifiedBlockType	CoordRegio	BlockCode	Period_1 Effort	Period_2 Effort
Regular	Houlton	Abol Pond_CE	0	0.03333
Regular	Houlton	Abol Pond_CW	0	0
Regular	Houlton	Abol Pond_NE	0.01667	0
Regular	Houlton	Abol Pond_NE	0	0.03334
Priority	Houlton	Abol Pond_NW	0	9.50001
Regular	Houlton	Abol Pond_SE	0	0
Regular	Houlton	Abol Pond_SW	0	0
Regular	Columbia Falls	Addison_CE	2.78334	0
Regular	Columbia Falls	Addison_CE	0	0.36667
Priority	Columbia Falls	Addison_CW	0	0.13333

The SQL of the query that produced the table above, which does not include the grouping I did to grou them after this is:

SELECT Atlas_Winter_Blocks_ArcMap.[All Maine Regions], Atlas_Winter_Blocks_ArcMap.[All Maine Regions; ModifiedBlockType], Atlas_Winter_Blocks_ArcMap.ModifiedBlockType, Atlas_Winter_Blocks_ArcMap.CoordRegio, Atlas_Winter_Blocks_ArcMap.[CoordRegio; ModifiedBlockType], Atlas_Winter_Blocks_ArcMap.BlockCode, IIf([Atlas_Winter_Blocks_ArcMap]![BlockCode]=[Final Grouping 2 Query]![BlockCode] And [Final Grouping 2 Query]![WINTER_TIME_PERIOD]="Period_1",[Final Grouping 2 Query]![SumOfMaxOfFINAL_DURATION_HOURS],0) AS [Period_1 Effort], IIf([Atlas_Winter_Blocks_ArcMap]![BlockCode]=[Final Grouping 2 Query]![BlockCode] And [Final Grouping 2 Query]![WINTER_TIME_PERIOD]="Period_2",[Final Grouping 2 Query]![SumOfMaxOfFINAL_DURATION_HOURS],0) AS [Period_2 Effort]
FROM Atlas_Winter_Blocks_ArcMap LEFT JOIN [Final Grouping 2 Query] ON Atlas_Winter_Blocks_ArcMap.BlockCode = [Final Grouping 2 Query].BlockCode;

What I need my query of the base block file to look like after bringing in the data from the effort query, and what I have after my convoluted approach using grouping after the previous query is:

ModifiedBlockType	CoordRegio	BlockCode	Period_1 Effort	Period_2 Effort
Priority	Aurora	Alligator Lake_NW	4.23333	3.11666
Priority	Aurora	Amherst_NW	6.61669	3.633317
Priority	Aurora	Bottle Lake_NW	1	0
Priority	Aurora	Brandy Pond_CE	0	0
Priority	Aurora	Burlington_NW	0	0
Priority	Aurora	Duck Lake_NW	0	0
Priority	Aurora	Gassabias Lake_NW	0	0
Priority	Aurora	Great Pond_NW	5.23333	4.56667
Priority	Aurora	Greenfield_NW	0	0
Priority	Aurora	Hopkins Pond_NW	3.2	3.19967
Priority	Aurora	Lead Mtn_NW	0	1.283
Priority	Aurora	Lead Mtn_SE	0	0.533
Priority	Aurora	Lee_NW	0	0
Priority	Aurora	Lincoln East_NW	5.03334	0.516667
Priority	Aurora	Quillpig Mtn_NW	0	0
Priority	Aurora	Rocky Pond_NW	3.41666	3.95

Hopefully this helps somewhat with understanding what is going on.
Than you again for reaching out!
Maggie

Joe4 · Apr 14, 2021

OK, so it appears that everything is in order, and the issue (not really an "issue", but more of the "situation" due to the data structure, which is fine) is that you are dealing with a one-to-many relationship between the two tables.The "BlockCode" field that you are joining on is unique in the one table, but can be duplicated in the other table. So naturally, the query between those two tables will result in duplicate BlockCodes since they are already duplicated in your second table.

How to deal with that? Exactly as you have described here:

I ran another query off of that one and applied "totals" (grouping) to the query in design view, with period 1 & period 2 set to sum to compress the dataset and remove duplicates, but that just seems like a very gaumy inefficient way to get this done.

There is nothing wrong with this method - that is exactly how you deal with relationships like this.
You will find that Aggregate Queries are one of the most powerful tools you use in database queries, and are your good friend!

Maggie Barr · Apr 14, 2021

Joe4 said:
OK, so it appears that everything is in order, and the issue (not really an "issue", but more of the "situation" due to the data structure, which is fine) is that you are dealing with a one-to-many relationship between the two tables.The "BlockCode" field that you are joining on is unique in the one table, but can be duplicated in the other table. So naturally, the query between those two tables will result in duplicate BlockCodes since they are already duplicated in your second table.

How to deal with that? Exactly as you have described here:

There is nothing wrong with this method - that is exactly how you deal with relationships like this.
You will find that Aggregate Queries are one of the most powerful tools you use in database queries, and are your good friend!

Joe4,
AWESOME!!! Thank you! I knew that structurally the data does this, as I have dealt with it in Excel Power Query with the same issue and had to perform essentially the same thing. I had just hoped that there was some way in the formula to force it to "insert" the data for the row, thus not creating a duplicate row. I am glad to hear that there is nothing wrong with my method, actually super psyched as I don't consider myself to be "trained" in any of this. I use the "totals" aka grouping a ton in my data for many manipulations. My files are quite large and I have multiple Access databases to process the data because of file size caps of the program (8 in entirety for just the breeding data, then I have the winter data), so I always try to ensure I am doing things as efficiently as possible to keep memory use and file size as small as possible.
Thank you so much for reaching out to help and letting me know I had in fact done what needed to be done appropriately and as efficiently as possible.
Best Wishes,
Maggie Barr

Joe4 · Apr 14, 2021

You are welcome.
Yes, the query itself is really not creating any duplicates, it is just mirroring the duplicates that already exist in the second table.
Aggregate queries can be used to group records to "compress" that data down to one file.

Sometimes you will see duplicate records be created when there are not and duplicates in the underlying table. This often happens when the two tables need to be joined by multiple fields, but the person has only joined on one. A simple example of this is if you were joining on names, and there were separate fields for first and last name. If you just joined on last names, that is not enough, because multiple people could have the same last name, so it may create unintended duplicates.

Access query expression builder is creating duplicate rows in query, please help.

Maggie Barr

Board Regular

Joe4

MrExcel MVP, Junior Admin

Maggie Barr

Board Regular

Joe4

MrExcel MVP, Junior Admin

Maggie Barr

Board Regular

Joe4

MrExcel MVP, Junior Admin

Similar threads

Share this page

Access query expression builder is creating duplicate rows in query, please help.

Maggie Barr

Board Regular

Joe4

MrExcel MVP, Junior Admin

Maggie Barr

Board Regular

Joe4

MrExcel MVP, Junior Admin

Maggie Barr

Board Regular

Joe4

MrExcel MVP, Junior Admin

Similar threads

Share this page

We've detected that you are using an adblocker.

Which adblocker are you using?

Disable AdBlock

Disable AdBlock Plus

Disable uBlock Origin

Disable uBlock