With good concepts for the customer lifecycle, you can win customers effectively and in the long term for your company. Learn more here! ... Continue reading
The abbreviation WDFIDF stands for “Within document frequency*Inverse document frequency”. Through WDF*IDF texts can be analyzed and evaluated. WDF determines the relevance of the content and IDF the weighting of a word compared to other documents with similar content.
While the formula WDF*IDF is not particularly difficult, the calculation of the two individual parts WDF and IDF requires significantly more mathematical understanding. First, let’s take a look at the main formula and shed some light on it:
Wi,j = WDFi,j * IDFi
The long-winded explanation is the following: WDF*IDF calculates the term weight W of a specified word i in document j. So the formula multiplies the frequency of a word i in a given document j with the frequency of this word within many relevant documents.
An example for easier understanding:
In our wiki there is an article about the topic target group. Now we want to examine how the main keyword “target group” performs in relation to other texts of the same category. To find out this weighting, we calculate the WDF*IDF value of this keyword. The higher the value, the less relevance and weight the keyword has compared to other documents.
The WDF value of our article is 0.5392 and the IDF value is 2.5117. If we multiply the two values, we get the WDF*IDF value of 1.354. This is a good value. We will now take a closer look at how the individual values come about in detail.
The name “Within Document Frequency” already describes the purpose of WDF quite well. WDF calculates how often a certain word occurs in the text. Some people might now think of the well-known keyword density and WDF actually works similarly. However, the logarithm in the formula compresses the result, so that a very frequent repetition of the keyword leads to a significantly better score. So the relative frequency of a word is calculated by putting the keyword in relation to the other words. This is to prevent senseless keyword stuffing.
To calculate the WDF value, you need the frequency of the keyword (i) in the document (j). Then divide this value by the total length (L) of the text.
Small tip: If you don’t have a log2 button on your calculator, you can still calculate a log2 with ln(VALUE) / ln(2).
WDFi,j = log2 (Freqi,j + 1) / log2 (L)
Let’s stick with our example:
In our article, the word “target group” is present a total of 37 times. This corresponds to a freqi,j of 37. The total length of the text (L) is 850 words. If we write this into our formula, it results in:
WDF = log2 (37 + 1) / log2 (850)
WDF = 0,5392
If we increase the keyword count of “target group” by 20, to a total of 57, this would result in a WDF value of 0.6019. Although the keyword count has increased by 54%, the WDF value has only increased by 0.0627 points. If, on the other hand, we had only 17 times the word target audience in the text, i.e. 20 times less, the WDF value would be 0.4285. So it has decreased by more than 0.1. This shows that it does not help much to mention the keyword unnaturally often in a text.
The “Inverse Document Frequency” cannot be explained so nicely with the help of a translation. In short, this formula relates the keyword to all documents that contain this word. Two document collections are needed for the calculation: One with the value of all relevant documents with the same topic (ND), divided by all documents with the examined keyword (Ni). Again, the logarithm provides a compression of the result to avoid outliers.
IDFi = log10 (1 + ND / Ni)
Again, we take our example to hand:
We have classified the text within the main category “Marketing”. According to Google, there are about 2,970,000,000 contents (ND) for this. Now we still need the content in this area that contains our keyword “target group”. To do this, we simply enter “marketing target group” in Google and get around 9,170,000 results (Ni). Let’s now put these numbers into the formula:
IDF = log10 (1 + 2.970.000.000 / 9.170.000)
IDF = 2,5117
In combination, the WDF*IDF formula thus gives an approximate indication of how relevant the keyword is compared to other documents with the same topic and keyword. Of course, the reliability and accuracy of the results increases with the number of documents analyzed. However, the necessary effort is too high, so it is impossible to include all documents. Furthermore, it is important that this analysis is performed for each important keyword in a text, in order to cover the topic as comprehensively as possible and not to forget anything.
Keywords and content of a page still play an important role for search engine optimization. Although the influence of these factors has decreased over the years, they are still an important indicator for the relevance of a website. Therefore, a good WDFIDF analysis is essential to cover all important keywords and use them in the text. Because many WDFIDF tools show, among other things, which keywords are still missing, which occur very often in other documents.
Moreover, WDF*IDF helps in SEO for hollistic content creation. Besides keywords, the result shows other terms that frequently appear in connection with the main keyword. This can further increase the relevance of the content. If the topic is perhaps not yet comprehensively described, these relevance-increasing keywords uncover further topic areas that were not previously considered.
Thus, with the help of WDF*IDF tools, you can ensure that all keywords appear in the text and are sufficiently treated and additionally enrich the text with relevance-increasing terms. For the crawler of the search engines, the content then appears very relevant and extensive, so that the probability of a good ranking increases.
The two formulas are often used synonymously. In fact, however, there is a difference. WDF, as we just learned, puts the weight of a term in relation to the rest of the document and compresses it by the logarithm. TF, on the other hand, is a common calculation of keyword density, or “term frequency.” Formerly a proven tool, keyword density is rarely used today, as the mere number of a keyword does not provide reliable information about its weighting.
The formulas are therefore very similar, but produce different results. WDFIDF produces a compressed value that is easier to understand and less sensitive to so-called “outliers”, while TF*IDF analysis tends to output more extreme values and reacts more sensitively.
Despite the many advantages of WDFIDF, this formula is not the panacea for SEO. The content must be well and attractively written even with WDFIDF and should be supported with graphics or images. After all, the content is only one of many factors for effective SEO. The loading time of the website and especially mobile optimization play at least as big a role as the content. Last but not least, the competition for the keywords is also important and how Google ranks the domain overall.
Thanks to WDF*IDF, the content can therefore be optimized very well to satisfy the user intention and cover the topic in detail. However, it cannot replace expertise, knowledge and research. Therefore, we recommend using WDF*IDF for optimization. In addition, you should always keep in mind that WDF*IDF can lead to keyword stuffing and texts are often not pleasant to read as a result. It is therefore advisable to use WDF*IDF sensibly and thoughtfully, instead of spiking the text with the found words thoughtlessly.
WDF*IDF stands for “Within Document Frequency * Inverse Document Frequency” and indicates the relevance and weighting of a keyword within a document compared to all documents with similar content.
The WDF*IDF formula evaluates your own text and puts it in relation to all other documents that have the same content. You can then optimize your own text accordingly.
It ensures that the most important keywords are mentioned frequently enough and compares their weighting with the content of other websites. It also helps to find keywords that increase relevance.
Want to learn more about exciting topics in the industry?