Can we trust Web-page metadata?
A statistical study of embedded metadata in a sample of more than 4 million HTML Web-pages is reported. The paper tries to determine and quantify the validity of this metadata. Of particular interest is to see if it is trustworthy enough for determining the topic of a Web-page. Datasets are collected by a Web crawler running both as a general and a focused crawler. Metadata fields 'title', 'author