Time for a Grand Debate? MLS, Syndication, Data Mining

Just how deep does this rabbit hole go?

Something must be in the air. Within a fairly short period of time, I’ve had emails, phone calls, and direct messages about the perils of listing syndication. In the last couple of days, two extremely smart, extremely influential people have written on topics that revolve around the issue of syndication. The first is John Heithaus, the Chief Marketing Officer of MRIS, who wrote Reality Check Ahead: Data Mining and the Implications for Real Estate Professionals. Following up on John’s post is Greg Robertson of VendorAlley, whose post is tantalizing in the title (Syndication Hustle), but somewhat light on actual content. (I’m curious what Greg’s larger point was, and hope he’ll elucidate us.)

Combine with my post on Sami Inkinen of Trulia, and the responses I’ve got from that post, and the whole thing is starting to smell like zeitgeist to me. Something is astir in the mists.

It may be time for a grand debate to settle a few key issues. Since John’s post lays out many of the big picture issues, let us start there.

Data Mining, the Human Barcode, and Digital Fingerprints

John’s post is perceptive, deep, and requires some thinking through once you’ve read it. I highly recommend reading the whole thing if you haven’t already.

He begins with a discussion of cutting-edge data mining technologies that are now reality for companies like Google:

Then, a Wall Street Journal article titled “What They Know” was posted which discusses how companies are developing ‘digital fingerprint’ technology to track our use of individual computers, mobile devices and TV set-top boxes so they can sell the data to advertisers. It appears that each device broadcasts a unique identification number that computer servers recognize and, thus, can be stored in a database and later analyzed for monetization. This 3-minute video is a must-see!

By the way, they call this practice “Human Barcoding.” KB began to squirm. As we all should.

As it happens, while I’m not deeply conversant in these techniques or technologies, I have been aware for some time about the data collection efforts of companies like Google thanks to my friend Gene Yoon, founder of Bynamite, a company focused on dealing with issues of data privacy, data collection, and the like. “Human Barcoding” sounds frightening, but it’s really just an extension of practices as old as the Internet itself: the HTTP protocol itself sends all sorts of information about the computer, the browser, etc. And of course, cookies have been with us for years and years. And in the offline world, consumers have been presenting their little grocery discount cards or drug store membership cards for years and years, blithely accepting the fact that the retailers would now know whole lot about what you’re buying, and when, and where, in exchange for some discounts.

I guess I see value-laden terms like “Human Barcoding” and “Digital Fingerprinting” as fairly non-subtle ways for the media and the privacy advocates to make people afraid of something sinister. And maybe they should be afraid, but this post isn’t about whether such techniques are good or bad.

What worries John, and therefore concerns this post, is that the massive and sophisticated data mining and data analytics operations that allow advertisers to precisely target a particular user on a particular device would be brought to bear on the real estate industry and its property data:

Second, we need to understand how business intelligence and analytics are being applied to the data generated by real estate transactions today. If there is a monetization to the data without the knowledge and permission of the rightful owner, then, potentially, agreements need to be negotiated (or renegotiated) and modified to get in step with today’s (and tomorrow’s) inevitable ways of doing business. I’m not in any way opposed to data mining per se, the issue at hand here is fair compensation for the data on which it is based.

As it happens, I agree with John to a large extent. To the degree one thinks of the listing data as intellectual property, the issue isn’t data mining or monetization but fair compensation.

And yet, that isn’t the only issue, and once we start down this rabbit hole, we don’t know where it comes out. And that may be the reason to have a grand debate about these issues.

How Is Syndication Involved? Tracing the Chain of Logic

At first glance, John’s concerns about cutting edge business intelligence techniques being applied to real estate data appears to have nothing to do with syndication. Well, Greg’s post is more revealing:

To me, and others, it’s clear that the risks of syndication far out weigh the rewards. Yet brokers continue to sign agreements they never read with fine print they never see. Granted there are some best practices to follow, such as making sure the syndicator’s site has much less information than yours, and to make sure you understand what rights to the data you are giving away.

We also have Victor Lund of the WAV Group, another extraordinarily influential member of the MLS community, weighing in on John’s post:

You highlight a huge problem for real estate professionals. Companies that do not sell real estate have stepped between the professionals (MLSs, Associations, Brokerages, and Agents) and the consumers and are profiting on the backs of what I call curated listing content.

Curated listing content is accurate, complete, and up to date – backed by the professionals who painstakingly manage it every day. It is the most valued asset of real estate professionals, yet through the efforts of syndication it is being given away for free in hopes of attracting willing buyers and sellers. The reality is that less than 10% of traffic to agent/broker/mls websites come from syndication, and an even lower percentage of willing buyers.

Again, all true as far as the comment goes (although, I’m not sure how the MLS itself is not a “company that does not sell real estate” profiting on the backs of curated listing content).

Once we realize the issues, the chain of logic goes like this:

  1. Publishers are profiting from listings data.
  2. Listings data is sent to these publishers by syndication companies.
  3. The listings data that is sent via syndication is “curated listing content” that is accurate, complete, and up-to-date.
  4. “Curating” this data is neither easy nor free.
  5. No one gets paid by the publishers for this usage of data.

Once data mining techniques makes listings data even more valuable than they are today — which is precisely the controversy around RPR — then the companies that are responsible for curation, namely the MLS, are creating valuable intellectual property… then giving it away to enrich Sergei Bryn some more.

One suggested solution — at least by Greg and Victor — is to stop syndicating listings data. The publishers would then have to make do with “non-curated” listing data, which is to say, crap — or they would need to invest the time and money to do the curation themselves, or they would have to pay the MLS license fees based on intellectual property principles and practices.

And now we come to the issue at the bottom of the rabbit hole.

Has the Raison d’Etre of the MLS Changed?

The one on the right could make a pretty convincing reason to exist...

During a panel discussion at Inman NYC this past January, Art Carter, the CEO of CRMLS, made what I thought was an absolutely brilliant, and absolutely critical point. He was talking about whether MLS’s can be strategic, and he said that the first question he and his Board started with was: Do we have a right to exist?

It’s an amazing question for the CEO of a major MLS and his leadership team to be asking. And yet, it is the proper one to ask. If the MLS is providing no value, and is just a financial burden on its customers who often have no choice but to join, then it probably should not exist. Art and his leadership answered that question in the affirmative: Yes, we have a right to exist. I would probably be a little stronger: You have not only a right to exist, but a duty to exist.

The follow-up question, once you have established the right to exist, is: What is the value that we are providing such that we have a right (and a duty) to exist?

This question lies at the heart of the issue of syndication, data mining, and the MLS.

Historically, the MLS was little more than an exchange. Whether it was a group of brokers getting together at a local watering hole, or printed books, or telex machines, the historical value of the MLS was as a way for brokers to (a) agree to cooperation and compensation, and (b) get accurate information to each other to enable cooperation and compensation. The fortuitous organic development out of that is the compliance function of the MLS — the actual mechanism by which listing content is curated.

For years and decades, the value of the MLS to the real estate community was as this convenient exchange in which payment was guaranteed via contract between the members. Until the advent of the computer and the Internet, I don’t know that brokers and agents and anyone else cared all that much about all of the wonderful data that this exchange was throwing off. Syndication was not a widespread reality, nor was it particularly practical, prior to digitization of information.

So we must ask, why was the MLS so valuable to brokers prior to the digital age? The answer: because the MLS was the primary way that listing brokers could attract more buyer attention to the property he was charged with selling for a client.

At a time when almost all buyers went to a real estate agent to have him find a house, the MLS provided the best way to get more buyers to a property than any other advertising medium. Sure, brokers did newspaper ads and flyers and whatever else, but when consumers go through the professional to find a property, making sure that other professionals know about your listing is paramount.

With the advent of the Internet, the shift in consumer behavior meant that more and more, they were going directly to the Web to find the property instead of going through a real estate agent. Errol Samuelson gave a presentation at NAR Mid-Year in 2010 (I have a copy on file; can’t find it online) where he pointed out that in 2009, the number of people who had found the home they purchased through the Internet finally equaled the number of people who found it through a real estate agent. And the trend is clearly going towards more self-help.

In this environment, it is not at all clear that making sure other professionals know that you have a house for sale is the paramount value of a MLS. I asked a broker friend of mine at a recent conference what he would do if somehow, his MLS went down for the count for a long period of time — say 60 days. He replied, without hesitation, “I’d just use Zillow I guess”. I’d imagine that if I were employed by a MLS, those words would make me a wee bit worried. But then he followed up with, “But Zillow’s data is crap, so it’d be a pain in the ass.”

Has the value of the MLS changed for the broker? If so, the reason for an MLS to exist, the value that it delivers to its paying customers, has fundamentally changed.

The Grand Debate

Which leads us to the grand debate itself. The question that must be asked, debated, explored, and answered by all of the participants — brokers, agents, franchises, syndicators, publishers, Associations, and of course, the MLS — is whether the primary value of the MLS has changed for the broker. (I’m using broker here, although are some real questions whether the broker still is the customer of the MLS, instead of the agent.)

If the primary value of the MLS remains that of advertising a listing for the purpose of bringing more buyers to the property, then syndicating those listings follows as day follows night. Secondary value, such as data monetization, is just that: secondary. It cannot take precedence over the primary reason why brokers and agents continue to pay subscription fees.

If, on the other hand, the value of the MLS has become that of having curated, accurate data that professionals can rely upon in conducting business, then advertising the listing itself for more buyer attention becomes secondary. In that scenario, syndicating the listing is foolish, business suicide, and ultimately, doing disservice to the members who spend a great deal of time and money and energy to ensure that they are putting in the most accurate and up-to-date information on a property.

You can’t have it both ways, because the values are in conflict. Both advertising the listing and having accurate data are important values, but neither can be first. Only one can be the primary value, and the other subordinate.

So which is it?

The Rabbit Hole Goes On

Oh, if only the rabbit hole would stop there.

Suppose for the moment that the grand debate is answered as Advertising. There are some consequential questions that need to be answered:

  • Why should there be a difference between native MLS data and syndicated data? For example, one of the hottest topics in the industry, thanks to RPR, CoreLogic, and Move.com, is whether to send these companies the off-market information: actives, pendings, under contract, etc.
  • Aside from legal issues, is there any reason to keep some data “private” — such as Days On Market, or listing history? If the goal is advertising a listing to attract more buyers, is there some reason to keep all such pertinent information from the public, who is, after all, doing the search themselves?
  • If the primary value of the MLS is advertising, is there any reason for a MLS not to have a public-facing consumer website?
  • If the MLS’s primary value is advertising, are they doing too many other things that aren’t related to the core mission? (E.g., transaction management, offering software, websites, running data centers, etc.)

But now suppose the debate gets resolved the other way, and the primary value comes to be understood as data accuracy as a result of the compliance function of the MLS. These questions come to mind:

  • Without question, syndication should cease, except under negotiated license agreements from various publishers and users of data (like the Wall Street banks that RPR is targeting). But all of the content, this valuable intellectual property, is created by the members/customers/users of the MLS. Why should they be paying a subscription fee to do work to create the data? Shouldn’t payment go the other way?
  • How can any MLS justify having IDX?
  • How can any MLS not have a fairly significant legal team to enforce the intellectual property rights against scrapers and other data thieves?
  • Can you allow Google to index the listings themselves, no matter how beneficial such a thing may be for SEO?
  • Should buyer-only brokers/agents, who create no content, be treated differently than the listing brokers/agents who do?
  • MLS-as-Advertising can be extremely local; MLS-as-Data cannot since data mining, business intelligence, and data licensing (together with the legal enforcement that monetizing intellectual property inevitably involves – just ask the RIAA) all require some scale to be meaningful. A 500-member MLS can exist as an Advertising channel since local advertising is what matters; it is completely irrelevant as a Data aggregator. Does the small MLS still have a right to exist?

I’m sure I can think of more questions if I had more time. But I think you get the picture.

These are big issues. Fundamental questions. Strategic challenges. The rabbit hole goes very deep.

Let’s Have That Debate

I am so very glad that John raised the issue, and other industry experts have highlighted it. Let’s have that debate. Whenever and wherever convenient. As a consultant, I know that I have no general opinion on this; to me, the answer depends on the specific client’s situation and strategy. But I do know I can argue both sides of this particular issue, and would love to moderate? curate? encourage? stir-up? that conversation.

Looking forward to diving into the rabbit hole with you all.


Share & Print

Picture of Rob Hahn

Rob Hahn

Managing Partner of 7DS Associates, and the grand poobah of this here blog. Once called "a revolutionary in a really nice suit", people often wonder what I do for a living because I have the temerity to not talk about my clients and my work for clients. Suffice to say that I do strategy work for some of the largest organizations and companies in real estate, as well as some of the smallest startups and agent teams, but usually only on projects that interest me with big implications for reforming this wonderful, crazy, lovable yet frustrating real estate industry of ours.

Get NotoriousROB in your Inbox

5 thoughts on “Time for a Grand Debate? MLS, Syndication, Data Mining”

    • That you did, sir 🙂 And that paper is a good one by WAV – but that is one of the key reasons why we need this industry-wide debates. Because the paper sort of assumes that listing data is IP worth protecting, rather than just advertisements the brokers want sent to every corner of the globe in 85 languages…. That’s the interesting issue, no?

  1. “To the degree one thinks of the listing data as intellectual property”

    Why should it be, exactly? Can’t copyright a fact. Can protect the system used to encode the fact — the software, maybe even the names of the fields and so forth. Could at a stretch copyright the photos as a medium of artistic expression, a creative work. But the address, the price, the number of bathrooms — fact, fact, fact. Entering facts into a database is clerical work, not intellectual work. Getting someone to agree to have you be their representative is salesmanship, which may itslef require a nimble brain and plenty of smarts, but you’re what the admen call “creative”.

    Real estate agents exist because it was once costly and difficult to obtain information about what homes were available for sale in the area in which one wished to live. The MLS is a cartel which attempts to create a defacto monopoly on that valuable knowledge by limiting its distribution only to its membership. But the knowledge itself is quicksilver — just a set of facts. Once it’s out there you can’t hold onto it. As the cost of obtaining that information drops lower and lower the need for any agent at all becomes less and less. (There will always be a few; most people can manage to buy a shirt all by themselves and yet there is such a thing as a personal shopper.) MLSs could attempt to preserve their monopoly on information by dropping syndication. The New York Route, let’s say. But there ain’t very many places in the country where real estate is as valuable as it is in NYC. In other places, if you don’t want to starve to death before you sell the joint, you gots to advertize, loud and proud, and as it’s 2011 that means the web. If you lose the MLS, the data quality of the listings might get shittier, for a while. But if that gets to be a real problem then ensuring data accuracy becomes a competitive advantage for the consumer-facing sites as well, and I’d be willing to bet they can find some ways of ensuring it. In the meantime, it’s not clear to me that data accuracy is as big of a boon to buyers as it is to agents — nobody likes wasting a phone call only to find that the house you liked is off the market, but it’s far from the end of the world.

Comments are closed.

The Future of Brokerage Paper

Fill out the form below to download the document