top of page

Term frequency / inverse document frequency is a way of measuring the relevancy of a word in a document based on its frequency in the full set it belongs to. Words that appear very often in many documents, will be ranked lower than those which appear a lot in just one document.


A calculation for td-idf is done by multiplying (the number of times a word appears in a document divided by the total word count) by (the logarithm [with a base of 10] of (the total document count divided by the number of documents that contain the word)). As we recall from math class in high school, the logarithm of a number is found by determining the exponent for the base which will result in a set number. So, 3 will be the logarithm of 1000 with a base of 10. (10 x 10 x 10 =1000).


So if we have a document which contains a term 20 times, and has 1000 total words, and the document is part of a set of 10,000 documents, 50 of which contain this term, the tf / idf of the document will be 0.046. If 500 documents in the full set contain this term, the tf / idf will be 0.026. If the document with 1000 words, contains the term 200 times and is from the full set of 10,000 documents, 50 of which contain this term, the tf /idf will be 0.460.


The formula can be written like this:



To get the logarithm, first divide the total document set count by the number of documents with the key term; press equals; and then press the key on the scientific version of your calculator which is labeled log10.



 
 

In 2015, Japan passed its Act on the Protection of Personal Information. This data protection legislation established the Personal Information Protection Commission. The PPC works with government ministries to set guidelines on how to handle personal information; data breaches; data transfers to foreign countries; and data anonymization.

Specific guidelines also regulate how personal genetic data; credit information; and healthcare records are to be protected.


Data cannot be transferred to third parties unless the person concerned consents, but there are exceptions including those to protect the public health. Data transfers can proceed if no response has been received during an opt out period and notification has been provided.


An initial exemption for businesses which handle the data of less than 5,000 people was repealed in 2017.


The APPI can apply when foreign businesses acquire the personal data of Japanese citizens in order to provide goods and services in Japan.




 
 

Note that the Edge browser for Windows 10 is set to run some of its core processes in the background so it will open and run more quickly. You can check its status by going to: edge://settings/system


If you're relying on Chrome or FireFox, it may make sense to turn off the startup boost and free up memory for additional processes.

 
 

Sean O'Shea has more than 20 years of experience in the litigation support field with major law firms in New York and San Francisco.   He is an ACEDS Certified eDiscovery Specialist and a Relativity Certified Administrator.

The views expressed in this blog are those of the owner and do not reflect the views or opinions of the owner’s employer.

If you have a question or comment about this blog, please make a submission using the form to the right. 

Your details were sent successfully!

© 2015 by Sean O'Shea . Proudly created with Wix.com

bottom of page