Acess XP: append and duplicate records

baseball

Board Regular
Joined
Apr 1, 2002
Messages
153
I have two databases of minor league baseball statistics that have the same fields. The first database spans a period of about 50 years and only contains data for players in a specific major league club's farm system. The second database covers a 20-year span that falls within the period covered in the first one, but it has all players regardless of major league affiliation, so it will include duplicate records.

My plan is to add a field to the first database and filling it with "yes" to flag all of those players. I'll add the same field to the second database, but not fill it.

What happens when I append the data from database_2 to database_1? Will I get duplicate records? For example:

db1
NAME YEAR TEAM AFFIL
Doe,J 1950 Rome yes

db2
NAME YEAR TEAM AFFIL
Doe,J 1950 Rome

After Append, do I get
Doe,J 1950 Rome yes
Doe,J 1950 Rome

or, just
Doe,J 1950 Rome yes

Also, is there anything to be gained by making the AFFIL field a Yes/No data type, or would simply "y" be easier to work with for queries?

Thank you,
Cliff
 

Excel Facts

How to show all formulas in Excel?
Press Ctrl+` to show all formulas. Press it again to toggle back to numbers. The grave accent is often under the tilde on US keyboards.
Hi,

If you do what you proposed then you will get duplicate records. I'm assuming you don't want duplicate records so there are 2 things you can do.

One is to identify duplicate records and delete them after the event. This is not my preference and the use of a yes/no variable indicates this is what you are thinking of doing. BTW, yes/no versus "y"/"n" - there is no real difference altough I presonally would have gone for the yes/no.

The alternative is to prevent the duplicate records from getting into the table and is my preferred solution (it will be better for your database in the long run). The easiest way to achieve this is by the use of "keys". If there is a unique identifier of some sort then you can prevent duplicate records by identifying the unique variable (or in your case a combination of variables that gives you something unique) and making that your primary key in both tables.

It looks like the combination of the player name, year and team are unique and if you make all 3 fields (in both tables) the primary key and then append the 2nd table to the 1st table, Access won't allow duplicate primary keys and this will avoid the problem of duplicate records. Put simply, Access will not paste the duplicate values (based on the primary key).

If you're not sure about primary keys then there is an MS article here.

HTH, Andrew. :)
 
Upvote 0
andrew93 said:
If you do what you proposed then you will get duplicate records. I'm assuming you don't want duplicate records so there are 2 things you can do.

One is to identify duplicate records and delete them after the event.

The alternative is to prevent the duplicate records from getting into the table and is my preferred solution

It looks like the combination of the player name, year and team are unique and if you make all 3 fields (in both tables) the primary key and then append the 2nd table to the 1st table, Access won't allow duplicate primary keys and this will avoid the problem of duplicate records. Put simply, Access will not paste the duplicate values (based on the primary key).
Andrew, there are a couple of problems with your solution. First, there may indeed be duplicate entries. A player may have been traded during the season from one team in a league to another, then later that same season traded back to the fiirst team. This is far from a frequent occurence, but it does happen and I have more than 100,00 records, at the moment, to condsier.

Probably more of a problem is data integrity, which surprisingly is not limited to pre-computer days. Names get misspelled, letters get inverted in transcription, first names can range from formal to nickname, to initials depending on who records the data. Some times, there first name is not even known.

Typically in a baseball database, each individual player is given a player ID, usuually based on the first four or five letters of the last name plus the first two of the first name plus a two-digit number to differentiate among duplicates. For example, there was a pitcher named Tommy Johns, and there have been a large number of Johnsons and some Johnstons.

It would seem to me that this is what an update query is for, but I can't figure out how to implement it. What if I were to add a field named Affiliation to each database, set the value to "no" for all records in the main database and to "yes" in the secondary database; could I then use the second one to update those common records? (I'm used to extracting data from databases; not putting it in.)

Thanks,
Cliff
 
Upvote 0
Hi Cliff

If you do a count query, how many instances are there of players being traded more than once in a season? You can get this from a new query, view totals, add the player name (group by), year (group by), team (count) and set the criteria for the team to ">2" (without the quotes). This will tell you exactly how many "problem" records there will be ("problem records" in that they will not be appended if you use my previous suggestion of using the multiple fields as the primary key). If it's not too many then they could be added manually later (we can cover this bit later if you use my original suggestion).

However, now I'm confused, from your first post I thought you wanted to "append" the records from one table into another. In your last post you mentioned "update" so now I'm not sure what you want to do.

Either way, if the data is not consistent and integrity (e.g. nicknames versus formal names) is poor then you will have a hard time matching records given the variety of spellings plus the high likelihood of players with identical or similar names occuring over the years.

But, you did mention the "player id" - is this actually being used in your database? Can you rely on this field to give you a unique player? Or given the example you provided, is it likely that differing players have the same player id? Can you elaborate on this a little more?

Andrew.
 
Upvote 0
Andrew,

Sorry for the confusion viz. append/update. I initially appended data for the years before and after those contained in database1, then I slowly realized that I needed to replace the years in database1 with the updated data.

If my data already had playerIDs, the my task wiould be simpler, but I have to generate them, a task that will be very difficult because of the lack of data integrity. Every player will have a unique playerID. As I mentioned previously, there is a general convention that has been used with other baseball databases, which is to AND portions of the last name and first name, then use sequential numbering to differentiate between dupilcate roots.

The form this database will take will be to have a Master table, that contains the playerID, LastName, FirstName, and other biographical data. There will be only one entry per player here.

Then, there will be tables for Batting, Pitching, Fielding, which will be linked by playerID to the Master table. There will be duplicate listings by playerID: one for each player-team-season. These tables wiill contain information about year, team, league, games played, at bats, innings pitched, putouts, etc.

I have a model on which to base this work, but I need to learn how to get disparate forms of data into the appropriate tables.

A family matter has come up, and I have to go out of town on Friday for about a week, so I'm not going to be able to put any of your suggestions into practice until my return.

Thanks,
Cliff
 
Upvote 0

Forum statistics

Threads
1,221,828
Messages
6,162,215
Members
451,752
Latest member
freddocp

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top