What do WDF*IDF mean?

WDF*IDF stands for ‘Within Document Frequency * Inverse Document Frequency’ and indicates the relevance and weighting of a keyword within a document compared to all documents with similar content.

How does the WDF*IDF analysis work?

The WDF*IDF formula analyses your own text and compares it to all other documents with the same content. You can then optimise your own text based on this.

How is WDF*IDF calculated?

TF*IDF calculates the term weight W of a given word i in a document j. The formula thus multiplies the frequency of a word i in a particular document j by the frequency of this word within many relevant documents. The formula is therefore Wi,j = WDFi,j * IDFi.

What is the function of IDF and how is it calculated?

The inverse document frequency is not so easy to explain using a translation. In short, this formula sets the keyword in relation to all documents that contain this word. To calculate it, two document collections are needed: one with the value of all relevant documents with the same topic (ND), divided by all documents with the keyword being examined (Ni). Here, too, the logarithm compresses the result to avoid outliers. The formula for calculating IDF is: IDFi = log10 (1 + ND / Ni)

What are the benefits of a WDF*IDF analysis for SEO texts?

She ensures that the most important keywords are mentioned often enough and compares their weighting with the content of other websites. She also helps to find keywords that increase relevance.

Why is WDF*IDF indispensable for effective SEO?

Keywords and content on a page continue to play an important role in search engine optimisation. Although the influence of these factors has decreased over the years, they are still an important indicator of the relevance of a website. That is why a good WDF*IDF analysis is essential to cover all important keywords and use them in the text. Many WDF*IDF tools show, among other things, which keywords are still missing that occur very frequently in other documents. In addition, WDF*IDF helps in SEO with the holistic creation of content. In addition to keywords, the result shows further terms that often appear in connection with the main keyword. This can further increase the relevance of the content. If the topic is not yet fully described, these relevance-enhancing keywords reveal further topics that have not been considered before. Thus, you can use WDF*IDF tools to ensure that all keywords appear in the text and are sufficiently covered and, in addition, further enrich the text with relevance-enhancing terms. The content then appears very relevant and extensive to the search engine crawler, increasing the likelihood of a good ranking.

What is the difference between TF*IDF and WDF*IDF?

The two formulas are often used synonymously. In fact, however, there is a difference. As we have just learnt, the weighting of a term in relation to the rest of the document is set by the TF-IDF and compressed by the logarithm. TF, on the other hand, is a conventional calculation of the keyword density, i.e. the ‘term frequency’. Formerly a proven tool, keyword density is rarely used today because the mere number of a keyword does not provide any reliable information about the weighting. The formulas are therefore very similar, but lead to different results. While TF*IDF tends to produce more extreme values and reacts more sensitively, HDF*IDF produces a compressed value that is easier to understand and less sensitive to so-called ‘outliers’.

What are the disadvantages of WDF*IDF?

Despite the many advantages of WDF*IDF, this formula is not a panacea for SEO. The content must be well and attractively written even with WDF*IDF and should be supported with graphics or images. After all, content is just one of many factors for effective SEO. The loading time of the website and, above all, mobile optimisation play at least as important a role as content. Last but not least, the competition for the keywords is important and how Google classifies the domain as a whole. Thanks to TF*IDF, the content can be very well optimised to satisfy the user's intention and to cover the topic in detail. TF*IDF analyses words based on their frequency, without fully taking into account the context or semantic meaning. This can lead to a neglect of the overall context and the relevance of the content. It is therefore important to remember that it is no substitute for expertise, knowledge and research. We therefore recommend using TF*IDF for optimisation. You should also be aware that TF*IDF can encourage keyword stuffing, which often makes texts unpleasant to read. It is therefore advisable to use TF*IDF sensibly and deliberately, rather than thoughtlessly pepper your text with the words found.

WDF*IDF

Q: What is the function of WDF and how is it calculated?

The name ‘Within Document Frequency’ describes the purpose of WDF quite well. Translated into German, WDF roughly means ‘frequency within the document’. WDF calculates how often a particular word occurs in the text. Some people may now think of the well-known keyword density, and WDF actually works in a similar way. However, the logarithm in the formula compresses the result, so that a very frequent repetition of the keyword leads to a significantly better score. The relative frequency of a word is calculated by placing the keyword in relation to the other words. This is to prevent meaningless keyword stuffing. To calculate the TF-IDF value, you need the frequency or frequency of the keyword (i) in the document (j). Then divide this value by the total length (L) of the text. The formula is therefore: TF-IDFi,j = log2 (Freqi,j + 1) / log2 (L).

The abbreviation WDFIDF stands for “Within document frequency*Inverse document frequency”. Through WDF*IDF texts can be analyzed and evaluated. WDF determines the relevance of the content and IDF the weighting of a word compared to other documents with similar content.

Calculate and understand WDF*IDF

While the formula WDF*IDF is not particularly difficult, the calculation of the two individual parts WDF and IDF requires significantly more mathematical understanding. First, let’s take a look at the main formula and shed some light on it:

W_i,j= WDF_i,j * IDF_i

The long-winded explanation is the following: WDF*IDF calculates the term weight W of a specified word i in document j. So the formula multiplies the frequency of a word i in a given document j with the frequency of this word within many relevant documents.

An example for easier understanding:

In our wiki there is an article about the topic target group. Now we want to examine how the main keyword “target group” performs in relation to other texts of the same category. To find out this weighting, we calculate the WDF*IDF value of this keyword. The higher the value, the less relevance and weight the keyword has compared to other documents.

The WDF value of our article is 0.5392 and the IDF value is 2.5117. If we multiply the two values, we get the WDF*IDF value of 1.354. This is a good value. We will now take a closer look at how the individual values come about in detail.

The calculation and function of WDF

The name “Within Document Frequency” already describes the purpose of WDF quite well. WDF calculates how often a certain word occurs in the text. Some people might now think of the well-known keyword density and WDF actually works similarly. However, the logarithm in the formula compresses the result, so that a very frequent repetition of the keyword leads to a significantly better score. So the relative frequency of a word is calculated by putting the keyword in relation to the other words. This is to prevent senseless keyword stuffing.

To calculate the WDF value, you need the frequency of the keyword (i) in the document (j). Then divide this value by the total length (L) of the text.
Small tip: If you don’t have a log2 button on your calculator, you can still calculate a log2 with ln(VALUE) / ln(2).

WDF_i,j = log₂ (Freq_i,j + 1) / log₂ (L)

Let’s stick with our example:

In our article, the word “target group” is present a total of 37 times. This corresponds to a freqi,j of 37. The total length of the text (L) is 850 words. If we write this into our formula, it results in:

WDF = log2 (37 + 1) / log2 (850)

WDF = 0,5392

If we increase the keyword count of “target group” by 20, to a total of 57, this would result in a WDF value of 0.6019. Although the keyword count has increased by 54%, the WDF value has only increased by 0.0627 points. If, on the other hand, we had only 17 times the word target audience in the text, i.e. 20 times less, the WDF value would be 0.4285. So it has decreased by more than 0.1. This shows that it does not help much to mention the keyword unnaturally often in a text.

The calculation and function of IDF

The “Inverse Document Frequency” cannot be explained so nicely with the help of a translation. In short, this formula relates the keyword to all documents that contain this word. Two document collections are needed for the calculation: One with the value of all relevant documents with the same topic (ND), divided by all documents with the examined keyword (Ni). Again, the logarithm provides a compression of the result to avoid outliers.

IDF_i = log10 (1 + N_D / N_i)

Again, we take our example to hand:

We have classified the text within the main category “Marketing”. According to Google, there are about 2,970,000,000 contents (ND) for this. Now we still need the content in this area that contains our keyword “target group”. To do this, we simply enter “marketing target group” in Google and get around 9,170,000 results (Ni). Let’s now put these numbers into the formula:

IDF = log10 (1 + 2.970.000.000 / 9.170.000)

IDF = 2,5117

In combination, the WDF*IDF formula thus gives an approximate indication of how relevant the keyword is compared to other documents with the same topic and keyword. Of course, the reliability and accuracy of the results increases with the number of documents analyzed. However, the necessary effort is too high, so it is impossible to include all documents. Furthermore, it is important that this analysis is performed for each important keyword in a text, in order to cover the topic as comprehensively as possible and not to forget anything.

WDF*IDF is indispensable for effective SEO

Keywords and content of a page still play an important role for search engine optimization. Although the influence of these factors has decreased over the years, they are still an important indicator for the relevance of a website. Therefore, a good WDFIDF analysis is essential to cover all important keywords and use them in the text. Because many WDFIDF tools show, among other things, which keywords are still missing, which occur very often in other documents.

Moreover, WDF*IDF helps in SEO for hollistic content creation. Besides keywords, the result shows other terms that frequently appear in connection with the main keyword. This can further increase the relevance of the content. If the topic is perhaps not yet comprehensively described, these relevance-increasing keywords uncover further topic areas that were not previously considered.

Thus, with the help of WDF*IDF tools, you can ensure that all keywords appear in the text and are sufficiently treated and additionally enrich the text with relevance-increasing terms. For the crawler of the search engines, the content then appears very relevant and extensive, so that the probability of a good ranking increases.

The small but subtle difference between WDFIDF and TFIDF

The two formulas are often used synonymously. In fact, however, there is a difference. WDF, as we just learned, puts the weight of a term in relation to the rest of the document and compresses it by the logarithm. TF, on the other hand, is a common calculation of keyword density, or “term frequency.” Formerly a proven tool, keyword density is rarely used today, as the mere number of a keyword does not provide reliable information about its weighting.

The formulas are therefore very similar, but produce different results. WDFIDF produces a compressed value that is easier to understand and less sensitive to so-called “outliers”, while TF*IDF analysis tends to output more extreme values and reacts more sensitively.

WDF*IDF is not the holy content grail

Despite the many advantages of WDFIDF, this formula is not the panacea for SEO. The content must be well and attractively written even with WDFIDF and should be supported with graphics or images. After all, the content is only one of many factors for effective SEO. The loading time of the website and especially mobile optimization play at least as big a role as the content. Last but not least, the competition for the keywords is also important and how Google ranks the domain overall.

Thanks to WDF*IDF, the content can therefore be optimized very well to satisfy the user intention and cover the topic in detail. However, it cannot replace expertise, knowledge and research. Therefore, we recommend using WDF*IDF for optimization. In addition, you should always keep in mind that WDF*IDF can lead to keyword stuffing and texts are often not pleasant to read as a result. It is therefore advisable to use WDF*IDF sensibly and thoughtfully, instead of spiking the text with the found words thoughtlessly.

Olga Fedukov completed her studies in Media Management at the University of Applied Sciences Würzburg. In eology's marketing team, she is responsible for the comprehensive promotion of the agency across various channels. Furthermore, she takes charge of planning and coordinating the content section on the website as well as eology's webinars.

Olga
Fedukov, Marketing Manager o.fedukov@eology.de +49 9381 58290138

Calculate and understand WDF*IDF

The calculation and function of WDF

The calculation and function of IDF

WDF*IDF is indispensable for effective SEO

The small but subtle difference between WDF*IDF and TF*IDF

WDF*IDF is not the holy content grail

Contact get in touch

The small but subtle difference between WDFIDF and TFIDF

Contact
get in touch