Creating a New Duplicate Detection Rule & Running a New Job

By in Duplicate Detection, Dynamics CRM on Friday, June 19th, 2009

There is some ambiguity when it comes to duplicate detection jobs in Microsoft Dynamics CRM 4.0. I just published the rule, and it won’t let me run the job? Why not? I want to make the job I just ran recurring…why won’t it let me do that? In this blog post, I’ll answer some of those questions, and I’ll establish a duplicate detection process flow that needs to be done in order to find the largest amount of duplicates in the system.

The first thing you will need to do before running a job is to create and publish a duplicate detection rule. The rule area is located in the Data Management section of the Setting area of CRM 4.0.

Click on Duplicate Detection Rules, and either select one of the existing rules (if you would like to run a rule on an Account, a Contact, or a Lead), or click New if you would like to create a new rule on a different entity.

For this blog post, I’m going to edit my existing Lead rule. Let’s say I’ve just launched a huge marketing campaign, and I’ve just imported numerous lists of leads into my system. But I accidently didn’t select the option to not import duplicates. So now there most likely some duplicates sitting in my system that need to be taken care of. So I need to create a killer dup rule that’s going to go in there and grab all of them. There are a few different things to do before going about this to make sure you’re getting the most out of your dup rule:

  1. Look at the Lead form. What fields are almost all filled out? Make sure you are running the rule on fields that are almost all filled out. Run an Advanced Find on Lead created on that day that with fields that contain data. If you run a rule on fields that may not contain data, the rule won’t pick up the duplicates. Serious bummer.
  2. Make sure your fields are in the same format. This really applies to the phone number/fax fields. Phone number fields are an awesome field to run rules on (one of my favorites) – but it’s really only helpful if the phone numbers are in the same format, or if they’re actually populates (hence the first bullet point…). If your data looks like this…
    1. 555.555.5555
    2. [1] 555-555-5555
    3. [1] 555.555.5555
    4. 555-555-5555
    5. 555-5555
    6. 5555555
    7. 0

      …you’re in trouble.

    You may want to think about getting all of the phone number fields in the same format before running a duplicate detection rule on the phone number field. However, if you were smart before beginning to populate your data (wink-wink) and entered an auto-format code on the OnLoad event on the phone number field of the field properties, you should be good to go.

  3. Consider other criteria than simply the email. The OOTB rule for Lead (and Contact, and Account…) if for email address to be detected. However if someone has two email addresses or if they’re not valid, that’s not going to help you. You can still keep it in the rule to be safe (it is a good idea), but adding more criteria to the rule is a must to catch more duplicates.

Once I’m ready to create my rule, I unpublish the existing rule, and I go straight to the Duplicate Detection Rule Criteria. In this case, I’m going to create this one off of the phone number field (hooray for phone number formatting!). I select Address 1: Telephone 1 for the Attribute, Exact Match for Criteria (Tip: if you’re doing an Account Name match, it would be a good idea to do a Character match – sometimes people misspell long names – so match up to the first 3-5 characters). Then I click Save to save the record, select Actions, and then click Publish.

After you publish the rule, the Duplicate Detection listview will that it is “Publishing.” So if you attempt to quickly go and run the job right at that very moment, your hasty attempt at finding duplicates will be unsuccessful. You won’t even be able to run a job on the entity you want to. It takes a minute or two for the job to finally publish, and then you will be able to run the job against the entity.

After the job has successfully published, navigate to Leads (or whichever entity you chose to select), click More Actions, and select Detect Duplicates. Select either For Selected Records or For All Records on All Pages, whichever is your preference.

A window will appear that will allow you to enter information about the job that will run. This includes:

  1. The Name of the job: this is automatically populated for you; however you are free to change this if you like.
  2. The Start Time of the job: this can be set to your preference as well, especially if you are going to be running the job as a recurring one.
  3. The Recurring Behavior of the job: every 30 days, 180 days, etc. (This is optional. Dup jobs can be run as either a one-time thing or a recurring thing.)
  4. And the Email Options of the job: basically whether or not you want to be notified after the job is completed. You can also notify someone else – manager, colleague, etc.

After you have completed entering your information, click OK.

The Dup jobs are located in the Workplace area of your navigation pane in Microsoft Dynamics CRM. Wait until the Status Reason of your particular job says Succeeded, and then double click to open.

The window that opens initially shows you the basic information of the Dup job. Created by information, owner information…blah, blah, blah. But to see the duplicates, you need to click on View Duplicates on the left-hand side of the navigate pane. Once you click to see the duplicates, you will see all of the records listed at the top, the originals and the duplicates. The job is going to consider any information entered in second as a duplicate – even if the first record has almost no data. You have to personally select which record you want to keep in the system after you run the job in order to keep the record with the best data.

As you can see, my phone number job ran great. It picked up the duplicates in the system that had the same phone number – and as you can see they are definitely the same person. What you want to do next is merge the records together. You do this by making sure the duplicate record in the bottom listview is selected (if more than one potential duplicate is listed, you will need to merge one at a time, so select one record), and click Merge.

The best way to merge records, in my opinion, is to use the Select Master approach. This way you can choose exactly which record you want to keep as the main one. This way you can choose the best record for the system based on existing Owner, based on least amount of potential record transfer, etc. To me the Automatic approach is uncomfortably ambiguous – and ambiguity is not the way to go when it comes to your data if you ask me…

So when you select the Select Master merge option, another window will open. This is where you will select the Master record, and you will select the data you actually want to keep. The really, super cool thing about this is that you can pick and choose based on any field on the form that has data (so if I choose my Master based on ownership, but it doesn’t have any address data, I can select the entire Address section from the subordinate record easily.).

After you have the Master record selected, and all the data fields selected that you want to appear in the final record, click OK. The Note in the little yellow horizontal box above says it all. The master record gets all of the other records related records – its Contact records, Opportunity records, Contract records, Activity records, etc. – before the subordinate is deactivated.

Repeat that step for each of the records in the top listview in the View Duplicates listview of the Duplicate Detection Job Results window.

This definitely isn’t the only way to approach a Duplicate Detection. As you see in the screenshot above, my phone number fields are filled out. But what if the phone number field on my subordinate record was empty? The job wouldn’t pull the duplicate. That’s where the holes exist in the job. If there are a lot of records in the system that don’t have fields filled out, they won’t be caught by the large majority of dup rules. So it’s good to not rely on one angle when running the jobs. Try a few different angles:

  1. [For the Account entity] Website & City, State, Zip
  2. [For the Account, Contact, or Lead entity] Last Name & Email Address
  3. [For the Account entity] Primary Contact
  4. [For the Account entity] Account Name Address 1: Street 1
  5. [For the Account entity] Account Name (First 10 Characters) & City, State, Zip [This eliminates the possibility of separate branches across the country or international locations]

As long as you keep your data clean and populated, the deduping process won’t be too painful, and you’ll most likely catch the large majority of invalid data.

Happy deduping!

7 Responses to “Creating a New Duplicate Detection Rule & Running a New Job”

  1. Larry Combs says:

    Hi Kristen,

    Good inforamtion. Thanks! I have one question about a comment in your post.

    You say, “However if someone has two email addresses …”, but you don’t seem to address how to resolve this issue. I have this issue. My Contacts can have up to three email addresses, however when I go to try to add a duplicate detection rule to look for duplicates against all three possible address, I can’t. If someone enters an email adress into the E-mail Address 2 field, then I want to setup a rule to see if that email address is already in another Contact’s E-mail, E-mail Address 2, or E-mail Address 3 fields.

    It does not seem that CRM 4.0 allows this since we you try to create a rule and the Base Record Type is the same as the Matching Record Type, then it only allows you to pick one attribute. That is, I can match E-mail against E-mail, but I cannot match E-mail against E-mail Address 2.

    Thanks in advance for any help.

    • Kristen O'Connor says:

      When I’ve been creating DD rules, I’ve never been able to check duplicates across varying fields. It’s always something like “Address 1 = Exact Match,” or “Main Phone = Match First Characters (5),” etc. I don’t think that the DD rules have the functionality where you can check against two completely different fields. Because then it would, in theory, need to give you the ability to do a check against the “Main Phone” field and the “Email Address” field, unless it was previously defined in the system.

      And when you’re merging the records, you don’t have the ability to merge the “Main Phone” field with the “Email Address” field. So the rule shouldn’t technically point out duplicates that you won’t be able to reconcile.

      Does this make sense? I’ll ask some colleagues and do some additional research just in case I’m wrong, but as far as I know there is no way to check dups the way you are hoping to. My reason for pointing it out was just to say that if “Email Address 1″ matches, then you would know that it was a duplicate.

      I hope this helps you – I’ll do my best to see what kind of workaround would be available for cleaning up your data. I’ll post again on this topic.

  2. Danny says:

    Great Article!

    In the screen with heading “View Dupicates”, which view is that and how can i add a new column to that view if i have to add another column.

    Thank in advance!

  3. As for adjusting the column headings that you see when you click on View Duplicates, it remains a mystery to me. I haven’t found a view under any entity in question (similar to the Associated View of a Contact, for example) that would be related to a Duplicate Detection Job view, & the Duplicate Detection entities in CRM are not customizable.

    If I uncover any additional information, I will post again on the topic.

  4. Dom says:

    As for the view that is used for the duplicate detection job – it uses the lookup view for the entity. However, if you add any ‘related’ entities to that view, it can break the de dupe viewer. We found this when we added a column to the lookup view and then the dedupe ddnt work. You can not add addiitonal columns to the view therefore. *bug*

  5. Mizrach says:

    How can you configure duplicate detection for Opportunity that will check the following fields?
    - AccountID
    - Request Type
    - Site Address
    - Products

    • Duplicate Detection, while helpful for a large majority of companies, does have its limitations. You could set up a rule that will check for Accounts with the same Account ID and URL by following the steps in the blog, but checking to see if the same Products have been added would be something that Duplicate Detection could not accomplish.

Leave a Reply

*
*