How to Track Changes to Metadata | Improve SEO Performance

Websites, especially large enterprise sites, can be cumbersome and time-consuming. It’s perfectly possible for there to be over several-thousand points of data on a single SEO feature. Therefore, prioritisation is essential and tackling the highest priority issues first makes a difference.

One issue that varies in terms of prioritisation requirements is changes to metadata. Particularly on larger websites, metadata across sections of your site may change both intentionally and unintentionally. It’s important to understand when this happens.

This can include instances where the incorrect language has been used on market-specific sites, templates have not been used or worked as intended, or metadata has been overwritten due to technical or manual faults. This can all negatively impact rankings and CTR.

Thankfully, if you’re extracting and recording metadata at intervals tracking whether a change has occurred is simple.

The Quick and Easy Solution

With smaller datasets, aligning the metadata in a spreadsheet, with each column representing one type of metadata or string (e.g. title) and a date, provides the necessary skeleton to use a straightforward IF statement to identify whether there have been any changes.

=IF(A1<>B1,"Yes","No")

The IF statement results in a “Yes” if there is any difference between the two cells and a “No” if there are none.

The limitations of this method are that it can be a lengthy manual process, there is a limit to the amount of data that can be processed at any one time, and the results cannot reveal how significant a change has occurred.

The Solution that Makes a Difference

=LEN(A1)-LEN(B1)

One alternative to find the extent to which changes have occurred is to determine the character length - LEN in excel - of each string (e.g. title) and finding the difference between two dates.

This approach creates a larger variation of results, thereby enabling prioritisation by taking those with the largest difference to mean the greatest degree of change has occurred. It works particularly well when the metadata has been erased or defaulted to something irrelevant, which ignores the typical character limits, either by greatly exceeding the limits or failing to utilise the space available.

You can couple this approach with the previous method of the IF statement to ensure no changes that coincidentally share the same character length has slipped through.

However, that added fail-safe to ensure changes are not missed in its entirety reveals the greatest flaw to this method. It assumes that the character difference can suitably measure changes to metadata. Not necessarily true.

Metadata is often written to suit the character limit, regardless of language, which means that two data points may have very similar character lengths but entirely different contents. This can result in entire issues being missed or high priority pages being placed in lower priority tiers.

Tracking Metadata in Python

This is where Levenshtein distance has a distinct advantage. Levenshtein distance measures the difference between two strings, finding the minimum number of single-character edits whether they are additions, subtractions or substitutions.

This means that regardless of how superficially similar two strings are, their differences can be quantified into a meaningful metric. Using Python, the fuzzywuzzy module can be used to acquire a Levenshtein distance similarity ratio for comparing two strings.

However, that’s not all it can do. Included in this module is partial ratio as well as token sort ratio. Partial ratio compares strings based on partial similarity. Essentially, the metric is better adjusted to the addition or subtraction of characters in a string when compared with the original.

The token sort ratio differs in that it is less sensitive to word order. These alterations to the original Levenshtein distance metric are useful as they provide another perspective and are sometimes better adapted when identifying issues with metadata changes.

Beyond Excel: Big Query

While Python can be used to read from and write to an excel spreadsheet, and so function similarly to the first and second method - with all but the middle part where the fuzzywuzzy module is required - this provides three big limitations.

The first limitation is, as previously mentioned, the overall size of the dataset is limited. Of course, python can be scripted to read from and write to multiple spreadsheets, but this is cumbersome and requires maintaining multiple spreadsheets at any one time.

The second limitation ties in with the first in that there is manual upkeep that can be time-consuming and inefficient. This problem can be exacerbated depending on the acquisition process of the data and whether any pre-processing is required prior to running the script.

Finally, good organisational skills are required to ensure files are stored correctly and file deletion or corruption risks are minimised.

The alternative is to use a data warehouse, such as Google’s Big Query. Using an API and SQL, metadata can be stored in a database and retrieved when necessary. For the purposes of this task and enabling ongoing tracking, Python is used to extract and store metadata from tracked pages in Big Query. The script then uses two SQL statements to retrieve two subsets of data.

The first SQL statement returns all metadata from the most recent date that data was acquired. The second SQL statement returns all metadata from the second most recent date. Of course, this can be altered to take any two dates specified, but for the purposes of this exercise, the two most recent dates are all that’s necessary.

These two datasets are joined on the URL so that the metadata from the two dates can be compared. The results are then pushed into a new Big Query table. From there, Data Studio dashboards that are attached to these Big Query tables can visualise the data in a more digestible format.

The result is that changes to metadata can be appropriately viewed and prioritised, whether the issue that requires investigation is expected to be a significant change or a minor one. In addition, from a health-monitoring perspective, this enables major changes to key pages to be identified quickly and efficiently.

The lack of manual work required in the production of these reports also reduces time. The entire process is thereby able to provide quickly acquired, prioritised, and identified insights for clients.

Final Thoughts

Entry-level metadata tracking, then, is simple for anyone to implement, provided you're already recording your metadata at intervals. However, with just a little more knowledge and effort you're able to track your metadata in such a way as to provide far more information about the extent and ultimate quality of intended and unintended changes that have occurred.

Of course, the solution that's best for you will depend largely on the type and size of website you operate and the amount of insight required. Once we go beyond relatively simple reporting in Excel things get substantially more challenging. But whichever monitoring report you choose to implement can help you avoid negative changes to metadata that can impact your SEO performance.