Better Data, Better Business: How AI Overcomes the Problem of Bad Data

Every eCommerce executive lives with the lingering aspiration: If only I understood what my customer wants to buy when (s)he visits my site. However, this dream is quickly pushed away since this information is thought to be unattainable. Having it would allow retailers to stop guessing during site experiences. Guessing will always be imperfect, filled with well-intentioned hunches and the promise “we will get it right next time.”

Focusing on the purchase intent of the individual consumer and responding in real-time is a perfect dream interrupted by the reality of poor intent data quality and inadequate timeliness in responses. In today’s post, we’re going to address the bad data. 

Data is indispensable for an eCommerce business, especially if that business wants to take advantage of a machine-learning-based customer engagement platform to optimize the customer experience. In today’s eCommerce landscape, the customer experience is everything, and in order to convert browsers into buyers, you need to be able to accurately understand customers’ needs and buying behaviors to identify intent correctly. Having a wealth of data is what yields such insights, and this is key for eCommerce businesses wishing to compete and grow.

Unfortunately, the data that you have to feed into a personalization system is never ideal, and often very low quality. The problem with this kind of “bad data” is that even though there’s enough of it to analyze, attempting to do so can be like comparing apples to oranges. Or streetcars to donkeys. We are all familiar with the expression “garbage in, garbage out”, and there is much truth to it.

Many retailers believe that they need to first fix their data—to make it apples to apples—before implementing a customer engagement solution that optimizes in real-time through machine learning based on data. But the truth is the advanced artificial intelligence that drives a robust customer engagement solution is exactly what you need to combat bad data without the need for manual effort or expertise in data analytics. While “perfection” in data is never possible, leveraging the intuition of AI can transform your bad data into high-quality data, which can fuel higher-quality business decisions and more rapid growth.

Every eCommerce business faces the problem of bad data at times. Let’s first take a look at why some data can be low-quality, and then how it can be mitigated and optimized for analysis.

What Makes Data Bad?

Bad data is data that is unstructured, incomplete, or inconsistent. Because data volume is critical for scaling and profitability in eCommerce, a rule of thumb is that the more data you have, the better. However, having more high-quality data is best.

Data can be low-quality because it is merged from multiple sources with different data structures, fields, and naming conventions. Some products may have the most useful information in a description column, which alongside that useful info, contains HTML, and contains lots of junk words and noise. Other products have useful information as a list of keywords. Yet other products may provide you little to work with other than a high-quality image.

An additional issue includes lack of standardization – some products may have the color “red”, whereas others refer to it as “rouge”.  

This becomes even more difficult as new products are added to your mix, which brings in a constant stream of new data to be integrated and analyzed. At times, it can seem next to impossible to keep up with this stream. Another reason data quality can be problematic is that you are getting no data at all for some new products.

This poses a problem because bad data muddies your view of your customer. If you can’t tie together all the attributes of your data, you can’t spot similarities and trends, or see which products complement one another.  

In addition, if you are using fields like description, then a search engine might return products that match noise words like “your style” or “inspired by”.

Not only does this put you at a competitive disadvantage, but your customer experience will suffer greatly. Even if you are an advanced data analyst (which most retail marketers and merchandisers are not) when you put bad data in, you will get bad results out.

So what can you do if your data isn’t of a high enough quality? That’s where Reflektion’s AI comes in.

How Reflektion’s AI Turns Jumbled Data into Gold

The AI that powers Reflektion’s customer engagement platform is sophisticated enough to optimize your data on the fly, turning unstructured data into structured data. Structured data is highly organized and formatted in easily searchable ways, sorting information into fields that are used uniformly. Unstructured data is more nebulous, with no pre-defined format or organization. In an eCommerce context, that might include, for example, words that customers search for using everyday language. That type of information is a hot commodity if you want to convert more buyers, as it is a direct window into what your customer wants. But in great quantities, that type of information can be difficult to collect, process, and put to use.

Turning unstructured fields into structured fields

Reflektion utilizes state of the art machine learning to analyze unstructured data fields like product name, description etc and the product images to extract structured data points. Reflektion has created a standard, vertical specific taxonomy for categories and product attributes. So when we come across a product name/description like “sleeveless zebra print tee”, we will generate the following data points for this product:

category: tops

subcategory: tees

pattern-family: animal-print

pattern: zebra-print

sleeve-type: sleeveless

Furthermore, analysis of the product image might result in more data points like:

color-family: pink

color: baby-pink

neckline: deep neck

These structured data points are internally called “Rfk tags” and the project/engine responsible for generating these tags is called “Rfk Tagger”.

These structured data points not only make it easier to infer a user’s affinity towards specific product categories or attributes, it also helps to improve the quality of similar item recommendations.

When we come across the color striped, or polka dot in a description, we will mark them both down as patterns, and will have created a new structured way to compare products, and remember a user’s affinity.

Here is a dress, and the default similar products, using only the fields the customer provided in a structured way.

ai overcomes bad data

Here are the products that are now provided using Reflektion TSI (Tags similar items), which is an algorithm that depends on our internal tagging capability, turning unstructured data into structured.

ai overcomes bad data

Notice that the cuts of the above dresses match the original dress much more closely.

Now that the data is structured and can be compared, Reflektion can put it to use in real-time within a search engine environment, product recommendations, category pages, and more.

No Data? No Problem.

Another frequent problem in eCommerce that you may have is that sometimes you have products for which you only have a few photos and no descriptive fields at all. How do you turn that scant information into high-quality, structured data that is ready for analysis?

Reflektion’s AI is able to extract usable information from the images themselves. It can automatically and accurately recognize colors, for example, populating or updating a field if it detects a magenta or pink sweater. It can also identify textures, which could mean determining whether a coffee table is made from metal or wood. Patterns are also easily discernible for Reflektion’s AI, which can, for example, tag throw pillows as solid, striped, or floral. This gives much richer data to analyze than if “sweater,” “coffee table,” and “pillow” were all you had to go by.

These simple examples just brush the surface of all the ways Reflektion can correct bad data with AI.

Continuing the above example, here are the dresses that are shown using both the Tag-based similar items (TSI), along with similar colors – which Reflektion has automatically deduced using image recognition.

bad data

Better Data Leads to Better eCommerce Strategies

By improving the quality of your data, Reflektion’s customer engagement platform can uncover hidden relationships across your mix of products and attributes as well as your customers’ subtle preferences. In today’s market, the ability to make data-driven decisions is your armor, and we are constantly broadening the datasets included in our Individual Customer Profile to include such categories as shopper geolocation, clickstream interactions, and purchase history.

When you have AI in your arsenal, you don’t need to be a data or analytical expert to harness the power of the data you are collecting through your eCommerce site every day. Using Reflektion’s interactive, easy-to-read, and highly visual analytics dashboard, you can have granular, contextual, and historical insights served up in real-time so you can have better control of your business performance and make data-driven strategic decisions.

When you have the right tools to structure your data and put it to work for you, it can help to reduce costs, enable you to launch smarter and more targeted marketing campaigns, find new product opportunities and offerings, and—most importantly, since this is the twenty-first century—understand your customers on ever-deeper levels.

Not only are we at the dawn of an age that there is less work to be done to merchandise the data on the site, but AI is also actually helping you understand your data, and is helping you improve the data itself.  Perhaps the expression garbage in garbage out is on the way to be replaced, with garbage in gold out.

If you would like to see how Reflektion’s AI can transform your bad data into a treasure trove of relevant shopper experiences and actionable insights, book some time with our team.


Shai Tamari