[wd_asp id=1]

Invalid UTF-8 encoding [description]: Merchant Centre Disapproved Products

Invalid UTF-8 encoding [description]: Merchant Centre Disapproved Products

This post is part of the Google Merchant Centre Guide - created by our Google Shopping Team

Google Shopping Strategy

Google Merchant Centre Fixes

Getting flagged for an Invalid UTF-8 encoding on one of your products in the Merchant centre can be frustrating and get in the way of being able to successfully advertise and sell your products in Google Ads. In this Blog Post, we will be running through, West a UTF encoding is, what can cause it to become invalid, and how to fix it and get your shopping ads back up and running if it does.

If you prefer video content, please watch this video:

Fixed: Invalid UTF 8 Encoding: Google Merchant Centre Disapproved Products

What is UTF-8

To begin let’s know exactly what we are dealing with Unicode Transformation Format- 8-bits. It is capable of encoding all possible characters, or code points, in Unicode. So it is essentially a way of encoding data in a text format. This format is then pulled into the merchant centre.

Common Causes of Invalid UTF-8 Encoding

UTF-8 encoding is a widely used character encoding standard that allows the representation of a vast range of characters from various languages and scripts. However, despite its versatility, there are certain common causes that can lead to invalid UTF-8 encoding. When dealing with product data and encountering issues like disapproved products in Merchant Centre, understanding these causes can be crucial for resolving encoding-related problems. Here are some common reasons why UTF-8 encoding can become invalid:

Incomplete or Truncated Data:

One common cause of invalid UTF-8 encoding is incomplete or truncated data. If a character or sequence of characters is not fully represented, it can result in an encoding error. This often happens when data is being transferred between systems or during file conversions, and some characters are inadvertently cut off.

Mismatched Character Encoding:

When data is processed or stored using different character encodings, it can lead to invalid UTF-8 encoding. For example, if data is originally encoded in ISO-8859-1 (Latin-1) and then treated as UTF-8, characters that are not representable in both encodings can result in encoding issues.

Special Characters and Escape Sequences:

Special characters and escape sequences, such as control characters or non-breaking spaces, can cause problems with UTF-8 encoding. These characters might not be correctly encoded in the UTF-8 standard, leading to invalid sequences.

Invalid Byte Sequences:

UTF-8 encoding relies on specific byte sequences to represent different characters. If an invalid byte sequence is encountered, it can disrupt the entire encoding process and result in data corruption.

Overlong Encodings:

Overlong encodings occur when a character is represented using more bytes than necessary in UTF-8. This can be exploited to create security vulnerabilities or simply result in incorrect character rendering.

Incorrect Handling of BOM (Byte Order Mark):

The Byte Order Mark (BOM) is sometimes used to indicate the endianness (byte order) of a UTF-8 encoded file. Mishandling the BOM or treating it as part of the data can lead to encoding errors.

Improper Validation and Sanitization:

When processing user-generated content, improper validation and sanitization of input data can introduce invalid UTF-8 sequences. This can happen when data is not properly filtered or sanitized before being stored or displayed.

Data Corruption during Transfer:

During data transfer, especially over networks, corruption can occur due to various reasons such as packet loss or transmission errors. This corruption can lead to invalid UTF-8 encoding when the received data is not what was originally intended.

Encoding Declaration Issues:

If an encoding declaration in a file (such as an HTML or XML document) does not match the actual encoding used, it can result in misinterpretation and lead to invalid UTF-8 encoding.

Software Bugs and Glitches:

Bugs in software libraries, parsers, or converters that handle UTF-8 encoding can introduce errors. These bugs might improperly interpret or manipulate the encoding, causing data to be encoded or decoded incorrectly.

Potential Issues with Invalid UTF-8 Coding

The issue itself is fairly straightforward in that the UTF-8 code it pulls can be invalid and it will Flag this in either the title or the Description field, and in the case, we are looking at today is the description. So what can cause the code to become invalid and cause the merchant centre to Flag it? The answer itself is again quite simple.

The description information can be found in the Raw feed Attributes on each individual product at the merchant centre. When The merchant centre pulls the information from either a feed or supplemental feed it can pull the long-tail version of either a Headline or Product description from whichever hosting service you are using whether it be woo-commerce, Shopify or anything else. This can include a variety of tags such as <h2>, <strong>, </il>, this is just a small sample but anything which is in the long form will be pulled through. However, tags like these are not what will cause the issue, as although when looking with the naked eye may seem complicated, Google can still understand and successfully implement that UTF-8 code into the merchant centre.

The issue arises with the introduction of non-standard characters. A common example of this would be © or ℗ tags, this copyright symbol is itself a nonstandard character, so if this is pulled from the data source via the feed into the Merchant centre Google will not be able to understand it, and from there this is where it will be flagged of Invalid UTF-encoding as © or ℗ does not fall inside the acceptable parameters on characters. © or ℗ is not the only non-standard character which will cause this but is the most common when it comes to products and inclusion in their descriptions. Ideally, however, the description should be a paragraph of plain text, as although may not cause any major issues could cause some complications. 

Another reason behind an invalid UTF-8 code would be if your feed or data was to pass through a third-party website before reaching the merchant centre. What could end up happening, in this case, is your data getting double encoded, encoding the already encoded information. So again once it reaches the Merchant Centre Google cannot understand and will have no idea what to do with it.

Popular eCommerce Videos

What Problems can Invalid UTF-8 Cause you?

But what problems does Invalid UTF-8 encoding [description] have in particular? Well first the Good news is if the Headline works and it is just a description that is invalid then your ad could still run, but there is a catch, and it’s a big one. With the description invalidation and if this is the case will not be in play within the merchant centre, it will severely restrict the terms that your product ads will match, as Google is unable to crawl this information. So the short headline will be all Google can work with, this could lead to missing out on a large percentage of your potential audience.

How to Resolve Invalid UTF-8 Encoding Issues

But how is this fixed? Like most problems of this nature, the best solution is to fix the problem at its source. If you can ensure that the correct data the if initially pulled from the source, without the long-form content, it should fix any current issues when the feed is next pulled through as well as get out in front of any further issues before they become a problem. You could also pass the original source data through a system to strip out all the original useless data, so you would pass your original feed through this processor and then get the merchant centre to collect the second feed, free of all the needless tags and Characters, whilst this isn’t necessarily fixing the issue at source it is grabbing it immediately and stops you having to do anything that follows below.

If this is not an option then there are also other methods you can use to fix this. The best of these would be to create a supplemental feed, which would simply be a case of getting the product ID, entering that and then the next column the original long-form description, from here a simple search and replace can clear up all the tags and extra information and characters not needed. Upload this to the merchant centre and this should run in conjunction with the original feed by replacing the long-form description being pulled.

However while good for the short term it can get a bit complicated in the long term, what happens when you add a new product next week? You’re then going to manually have to go and create a description and clean it up and put it in your Google sheet. What happens when there’s a problem in six months’ time when the description you realize on your website is slightly wrong? Maybe the size is wrong, you change the description on your website, but forget that there’s a description on the supplemental feed which is overwriting it so now the error in your data that you’ve now fixed isn’t been fixed because it isn’t being overwritten by the feed, you will have to change it on there also. So all you are doing is creating extra work down the line. This is why the preferred option will always be to fix the problem at its source.

A third and final way which is a last resort would be to create a set of rules within the merchant centre to strip out all the additional data. However this can cause even more complications as well as being a large amount of effort for yourself to set up, so we really would not recommend this.

Benefits of Using Valid UTF-8 Encoding in XML Formats for Online Stores

Utilizing valid UTF-8 encoding in XML formats brings forth a host of advantages that significantly enhance data accuracy, compatibility, and overall performance for online stores. The adoption of UTF-8 encoding ensures seamless communication of textual data across various systems and devices, resulting in a more effective and reliable online shopping experience. Let’s delve into how adhering to valid UTF-8 encoding can bring about positive transformations for online stores:

Precise Data Accuracy and Integrity:

Valid UTF-8 encoding guarantees the accurate representation of characters from diverse languages and scripts. By ensuring each character is appropriately encoded and decoded, the risk of data misinterpretation or corruption is eliminated. This is especially vital for online stores that cater to a global audience, as product details, descriptions, and other textual content need to be precisely conveyed, irrespective of the language used.

Enhanced Compatibility Across Platforms:

UTF-8 encoding is widely supported across different platforms, operating systems, and devices. Incorporating valid UTF-8 encoding within your XML formats guarantees that your product data can be seamlessly exchanged and displayed across diverse systems without encountering compatibility obstacles. This inclusivity extends to search engines, browsers, and other tools that process XML data, amplifying the reach of your online store.

Seamless Multilingual Support:

Online stores often cater to customers from around the globe who speak various languages. Valid UTF-8 encoding facilitates the seamless representation of characters from a multitude of languages, ensuring that your product information is accurately displayed for all customers. This is particularly essential for product names, descriptions, and any other textual content that requires multilingual support.

Optimized Search Engine Performance:

The success of online stores heavily relies on search engines driving traffic to their pages. By utilizing valid UTF-8 encoding in XML formats, you enable search engines to accurately index and display your product content, irrespective of the language or script used. This elevates your store’s visibility and searchability on a global scale, potentially increasing organic traffic.

Swift Performance and Efficient Processing:

Valid UTF-8 encoding is not only well-supported but also highly efficient. This leads to quicker processing and rendering of XML data, resulting in faster loading times for your online store’s web pages. Improved performance contributes to a better user experience, positively impacting metrics such as bounce rates, customer engagement, and conversion rates.

Enhanced Accessibility for All Users:

Incorporating valid UTF-8 encoding ensures that all users, including those who rely on assistive technologies and screen readers, can access your product content seamlessly. This underscores your commitment to accessibility and inclusivity, aligning with modern web standards and best practices.

Future-Proofing Your Data:

UTF-8 encoding is a well-established and robust standard. By adhering to valid UTF-8 encoding in your XML formats, you safeguard your product data against potential shifts in encoding standards. This protects the longevity and consistency of your online store’s content.

Conclusion

Above is just a quick overview of the issues that Invalid UTF-8 encoding in your description can cause, and how easy it is to go undetected. But with the above guide, you should know what to look for, the problems it can cause your overall performance on a Google Shopping Campaign and more importantly how to fix it.

Recent eCommerce Blog Posts

Leave a Reply

Your email address will not be published. Required fields are marked *

Share this article

In this blog, we delve into the intricacies of CRO, offering invaluable insights to marketers and business owners alike. Join
In a world where consumers and businesses are evolving to be ever more digital (95% of startups already have digital
Navigating the digital world can be both exciting and difficult for website owners. Web hosting is one of the important