Thanks Thanks:  0
Likes Likes:  0
Results 1 to 3 of 3
  1. #1
    Member Since
    Aug 2013
    Thanks (Received)
    Likes (Received)

    Duplicate Suppression vs. Deletion

    As many of you may know, Yext recently launched a duplicate suppression solution that can handle any number of duplicate listings at 45 of our publishing partners in the Yext network. This process usually only takes a few hours to a couple of days, although there are some exceptions that can take much longer. This includes when there are lots of conflicting data points, such as a new business name, old business address, new business phone, etc. It is a phenomenal tool -- one that is best demonstrated live by a member of the Yext team.

    Duplicate suppression launched exclusively to our Yext Certified Partners over the last several months, and we have received many questions about how duplicate suppression works and how it differs from duplicate deletion. This is a very common question, and it is also often asked of publishers, data aggregators, and anyone else with a database compiled from many different sources.

    First, lets define the suppression of a business listing record as the identification and storage of a record to prevent the dissemination of that information. Deletion, on the other hand, is defined as the full removal of a business listing record from a database.

    Before we go any further, I invite you to download the white paper by Andrew Shotland, The Definitive Guide to Duplicate Listings. This is an excellent guide to the duplicates problem, how it happens and what can and cannot be done about duplicates. NOTE: That link will cause someone from my team to reach out to you; if you'd rather a copy without the follow up, just email me at

    Below, I excerpt a bit from the paper on How Many Duplicates are Created because this is a source of great confusion:

    Common Types of Duplicate Business Listings
    (from Andrew Shotland’s white paper on Duplicate Listings)

    • Self-Created Dupes

    This happens over time when a business does not have a cohesive strategy to deal with their business listing. Typical self-created dupes happen when different parties in a business claim or add profiles to various directories and data suppliers without knowing that someone else in the organization has done this already. This also can happen if you are using a third-party tool to add a listing to various data aggregators and publishers and the tool does not effectively detect downstream duplicates.

    • Aggregator-Created Dupes

    The business listing data aggregators gather information from a variety of sources (sometimes thousands) to determine a business’ name, address, phone number, etc. The problem is that these aggregators turn out to be not that great at matching up the records from various sources (it’s a tough job which even Google struggles with) and during the matching process, more duplicates can be created which in turn can have an ongoing pollutant effect downstream at the the publisher level.

    • Publisher-Created Dupes

    At the publisher level itself, duplicates run wild as publishers have lax matching and data cleanup policies. Part of the problem is that local directory publishers have business models that are at odds with the average business trying to clean up their dupes, particularly if they are doing it for SEO. The publisher typically is looking to get paid for improving the presence of a local business on their network and is not as concerned with how the business appears in Google. In fact, I would argue a directory publisher has an incentive for the business to not show up well in Google, because then it will need to buy more leads from the publishers. So getting the publishers to police and clean up their own bad data can be slow and ineffective.
    For more technical detail on Aggregator and Publisher-created dupes, see the Solving for Duplicate Listings in Local Search section.

    • Cross-Pollination Dupes

    Because of crawling practices, there is a constant collision problem in the ecosystem where a publisher or aggregator crawls their way back into more dupes. For example, an aggregator sends a two dupe listings to a publisher, even if the aggregator fixes the dupe issue, if it relies on web-crawling as a source and it crawls the publisher’s site, the dupes could end up coming back to haunt you. Kind of like zombies.

    The list above isn’t exhaustive, and there are in fact many other ways and sources that can cause a duplicate (including competitors using misinformation submissions), but it's a great overview of how this is a multi-dimensional problem. In a very non-technical way, the following diagram just shows that sources of data are constantly being resent or ingested by aggregators and publishers, often with the same error filled records.

    Keep in mind that matching logic and matching engines are a science unto themselves and could fill a whole other post. While many will look at a “Joe’s Pizza” listing and a “Giuseppe’s Pizza” listing and quickly realize that they are obviously the same business with the same address, that type of nuance is incredibly difficult to program at a scale of millions of records. For just a taste of the science, here’s a Wikipedia article that touches upon deterministic record linkage (rule based matching) and probabilistic record linkage (fuzzy matching). Needless to say, building a platform that can provide perfect matching on hundreds of millions of data records is very difficult. Further, many aggregators and publishers don’t have the infrastructure or manpower to build enhanced data structures in order to deal with the heavy lifting in these areas.

    Suppression vs. Deletion

    With Yext, we offer suppression through our service because suppression is the only option to keep “known” duplicates out of the ecosystem.

    When I used to work for a major data aggregator, I used to get a phone call at least once a month from an irate business owner or irate professional marketer (on behalf of the business owner) screaming, “Delete my business from your database IMMEDIATELY!”. I would have to tell to them that, although I could technically delete a row from the table, that would not have the result they were after.

    After calming the owner or marketer down, I would explain that as a Data Aggregator (and this would apply to a publisher, too, for that matter) we were not the SOURCE of the data; instead, we merely aggregated and compiled the data. Sources are things like government records, social media posts, crawled data sets, commercial databases, Twitter posts, tax record filings, patent portfolio filings, and literally thousands of other primary sources of information. In fact, the term “aggregator” is used specifically to point out that they are not the source, but rather that they aggregate from many, if not thousands, of sources. As such, deletion can cause more problems than solutions.

    When you delete a record of a duplicate (or any record, really) you then remove the ability to then track that deletion to prevent it from once again reappearing. When you ask an aggregator or publisher to delete a record, they typically lose the ability to prevent the primary source from re-creating a duplicate because they are not the primary source themselves. In the immortal words of George Santayana:

    “Those who do not remember the past are condemned to repeat it.”

    Suppression, on the other hand, is the solution at publishers and aggregators where a temporary table is created, typically for every business listing, where the unique identifiers will be stored for any listing or data source that has been flagged as a duplicate . This allows for the aggregator or publisher to constantly compare new sources of data (or repeat sources of data) against both the database they publish to as well as the temporary table filled with all the rubbish that has been identified in the past.

    Suppression allows us to actually be on the look out for error records on an ongoing basis; by contrast, deletion all but ensures data will sneak back in, re-creating duplicates with new unique record identifiers. This means we have to start all over from scratch: finding the duplicate, collating a list, submitting the list, and trying to remove them individually.

    Perhaps this whole post can be summed up nicely by restating (and horribly demoting) the aforementioned quote to state:

    “Those who delete their duplicates are condemned to repost them.”

    Duplicate Suppression through Yext

    At Yext, we have built a unique infrastructure solution in collaboration with our publishing partners to provide suppression services. These services include the creation of the data structures necessary to support duplicate suppression, so that we don’t ‘delete and repeat’ the same error over and over again.

    Each publisher (currently 45 and counting) that supports duplicate suppression through Yext either provides a 301, 404, or custom redirect response, with the majority utilizing the 301 redirect. We are happy to provide anyone interested in learning more with a document that highlights the individual behavior at each publisher. The mobile applications we work with operate differently too, so ask our team about the details so you can see how our suppression service will work for you.

    Also, please keep in mind that even with Yext duplicate suppression in place, new duplicates from new data sources are likely to show up as time progresses for every business listing. You should periodically monitor your clients’ online presence.

    As long as the Yext PowerListing service for an individual location remains active, suppressions enacted through Yext will stay in place at each publisher for a discrete location. While some of our publishing partners can support ongoing suppression, some cannot because then they would have to re-architect their own databases, matching engines, and/or table structures.

    With Yext connected, you can control the business listing and you can control the duplicate suppressions around a business listing. However, when Yext is no longer providing service, our ability to lock and protect that information ceases. In this case, the normal operating data structure of the publisher will step back in. As with our PowerListings control, we do not delete or block the publisher from using the name, address, phone, or the identification of a duplicate in their own process following the removal of the Yext connection. Every publisher in our network has a unique data architecture and platform; they choose the process and implementation that best suits their own business objectives.

    Advanced reply Reply  

  2. #2
    Member Since
    Jun 2012
    Thanks (Received)
    Likes (Received)

    Re: Duplicate Suppression vs. Deletion

    Thanks Christian! No wonder you needed to do a separate post about the issue of suppression vs deletion.

    I can see now that it's so much more complicated than many realize. I for one didn't realize a lot of this, so I'm getting an education here too. Thanks for explaining it.
    Is Our Content HELPFUL? Please pay the community back by sharing!

    LocalSearchForumLinda Buquet .:. Google Local Specialist

    Consulting, Troubleshooting & White Label Services for SEOs & Agencies

    Don't Miss Important News & Tips! SUBSCRIBE to Daily Email Digest Here

    Are you a PRO? Join the "Local Search Pros" G+ Community!

    Note: Due to mulitple RSI injuries, pardon short replies. Typos? Blame it on "Dragon". ;-)
    Advanced reply Reply  

  3. #3
    Member Since
    Jul 2012
    Milwaukee, WI
    Thanks (Received)
    Likes (Received)

    Re: Duplicate Suppression vs. Deletion

    Quote Originally Posted by wardchristianj View Post
    [...] please keep in mind that even with Yext duplicate suppression in place, new duplicates from new data sources are likely to show up as time progresses for every business listing. You should periodically monitor your clients’ online presence.
    Hi Christian,

    Thanks for explaining the 'insider terminology' for getting duplicate business listings removed.

    I wanted to clarify that with Yext's duplicate suppression package in place, any new dupes that do arise can also be detected and suppressed by Yext... So, for the 45+ venues that Yext covers with that service, the 'periodic monitoring' would involve just checking the dashboard for new duplicate listings that may have been discovered by the detection system. (I have also seen the email notifications sent out that Yext has discovered a new possible duplicate.)

    We will all need to rephrase our emails when communicating with an IYP venue now... "Hello, Could you please suppress this duplicate and outdated listing for my client? Thank you."

    Russ Offord
    Advanced reply Reply  

Similar Threads

  1. Yext Duplicate Listing Suppression Service
    By iFuse in forum Local Search
    Replies: 24
    Last Post: 06-01-2015, 01:12 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts