Search Preview
Burton DeWilde
bdewilde.github.iodata scientist / physicist / filmmaker
.io > bdewilde.github.io
SEO audit: Content analysis
Language | Error! No language localisation is found. | ||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Title | Burton DeWilde | ||||||||||||||||||||||||||||||||||||
Text / HTML ratio | 50 % | ||||||||||||||||||||||||||||||||||||
Frame | Excellent! The website does not use iFrame solutions. | ||||||||||||||||||||||||||||||||||||
Flash | Excellent! The website does not have any flash contents. | ||||||||||||||||||||||||||||||||||||
Keywords cloud | data Read » language natural Friedman text extraction corpus Data processing quality science I’ve Corpus Thomas NLP task information scraping | ||||||||||||||||||||||||||||||||||||
Keywords consistency |
|
||||||||||||||||||||||||||||||||||||
Headings |
|
||||||||||||||||||||||||||||||||||||
Images | We found 6 images on this web page. |
SEO Keywords (Single)
Keyword | Occurrence | Density |
---|---|---|
data | 15 | 0.75 % |
Read | 10 | 0.50 % |
» | 10 | 0.50 % |
language | 9 | 0.45 % |
natural | 8 | 0.40 % |
Friedman | 8 | 0.40 % |
text | 7 | 0.35 % |
extraction | 6 | 0.30 % |
corpus | 6 | 0.30 % |
Data | 6 | 0.30 % |
processing | 5 | 0.25 % |
quality | 5 | 0.25 % |
science | 5 | 0.25 % |
I’ve | 5 | 0.25 % |
Corpus | 5 | 0.25 % |
Thomas | 4 | 0.20 % |
NLP | 4 | 0.20 % |
task | 4 | 0.20 % |
information | 4 | 0.20 % |
scraping | 4 | 0.20 % |
SEO Keywords (Two Word)
Keyword | Occurrence | Density |
---|---|---|
Read More | 10 | 0.50 % |
More » | 10 | 0.50 % |
natural language | 8 | 0.40 % |
language processing | 5 | 0.25 % |
data science | 5 | 0.25 % |
Thomas Friedman | 4 | 0.20 % |
here here | 4 | 0.20 % |
Harmony Institute | 3 | 0.15 % |
web scraping | 3 | 0.15 % |
Natural Language | 3 | 0.15 % |
Language Processing | 3 | 0.15 % |
Burton DeWilde | 3 | 0.15 % |
to the | 3 | 0.15 % |
» Friedman | 3 | 0.15 % |
Friedman Corpus | 3 | 0.15 % |
and Creation | 3 | 0.15 % |
Background and | 3 | 0.15 % |
variety of | 3 | 0.15 % |
corpus linguistics | 3 | 0.15 % |
in a | 2 | 0.10 % |
SEO Keywords (Three Word)
Keyword | Occurrence | Density | Possible Spam |
---|---|---|---|
Read More » | 10 | 0.50 % | No |
natural language processing | 5 | 0.25 % | No |
Natural Language Processing | 3 | 0.15 % | No |
here here here | 3 | 0.15 % | No |
More » Friedman | 3 | 0.15 % | No |
» Friedman Corpus | 3 | 0.15 % | No |
Background and Creation | 3 | 0.15 % | No |
a handful of | 2 | 0.10 % | No |
Language Processing NLP | 2 | 0.10 % | No |
Data Quality and | 2 | 0.10 % | No |
Quality and Corpus | 2 | 0.10 % | No |
quality of the | 2 | 0.10 % | No |
and Corpus Stats | 2 | 0.10 % | No |
and Creation post | 2 | 0.10 % | No |
domains Read More | 2 | 0.10 % | No |
see Background and | 2 | 0.10 % | No |
a variety of | 2 | 0.10 % | No |
andor social issue | 1 | 0.05 % | No |
inarguably better than | 1 | 0.05 % | No |
understanding natural language | 1 | 0.05 % | No |
SEO Keywords (Four Word)
Keyword | Occurrence | Density | Possible Spam |
---|---|---|---|
More » Friedman Corpus | 3 | 0.15 % | No |
Read More » Friedman | 3 | 0.15 % | No |
domains Read More » | 2 | 0.10 % | No |
Natural Language Processing NLP | 2 | 0.10 % | No |
Data Quality and Corpus | 2 | 0.10 % | No |
here here here here | 2 | 0.10 % | No |
Quality and Corpus Stats | 2 | 0.10 % | No |
Background and Creation post | 2 | 0.10 % | No |
see Background and Creation | 2 | 0.10 % | No |
Burton DeWilde About Me | 1 | 0.05 % | No |
discussion as it relates | 1 | 0.05 % | No |
as it relates to | 1 | 0.05 % | No |
it relates to a | 1 | 0.05 % | No |
relates to a film | 1 | 0.05 % | No |
to a film andor | 1 | 0.05 % | No |
a film andor social | 1 | 0.05 % | No |
of the discussion as | 1 | 0.05 % | No |
film andor social issue | 1 | 0.05 % | No |
andor social issue Although | 1 | 0.05 % | No |
social issue Although humans | 1 | 0.05 % | No |
Internal links in - bdewilde.github.io
About Me
Archive
Intro to Automatic Keyphrase Extraction
On Starting Over with Jekyll
Friedman Corpus (3) — Occurrence and Dispersion
Friedman Corpus (1) — Background and Creation
Friedman Corpus (2) — Data Quality and Corpus Stats
While I Was Away
Intro to Natural Language Processing (2)
Intro to Natural Language Processing (1)
A Data Science Education?
Connecting to the Data Set
Data, Data, Everywhere
Burton DeWilde
Bdewilde.github.io Spined HTML
Burton DeWilde Burton DeWildeWell-nighMe Archive CV Intro toWill-lessKeyphrase Extraction 2014-09-23 full-length diamond frequency statistics keyphrase extraction graph-based ranking NLP task reformulation I often wield natural language processing for purposes of automatically extracting structured information from unstructured (text) datasets. One such task is the extraction of important topical words and phrases from documents, wontedly known as terminology extraction or will-less keyphrase extraction. Keyphrases provide a transitory unravelment of a document’s content; they are useful for document categorization, clustering, indexing, search, and summarization; quantifying semantic similarity with other documents; as well as conceptualizing particular knowledge domains. Read More » On Starting Over with Jekyll 2014-08-10 blogging DataKind Disqus Harmony Institute Jekyll website diamond After flipside lengthy hiatus from blogging, I’m back! Long story short, I got so frustrated with Blogger’s shortcomings and complications, not to mention the unstipulated lack of tenancy over my content, that I lost the will to update my old blog. At the same time, I was putting in longer hours at Harmony Institute and volunteering on the side for DataKind, so I didn’t have much to say outside of official channels. That said, my data life has not gone entirely un-blogged: Read More » Friedman Corpus (3) — Occurrence and Dispersion 2013-11-03 corpus linguistics dispersion natural language processing occurrence Thomas Friedman Thus far, I’ve pseudo-justified why a hodgepodge of NYT wares by Thomas Friedman would be interesting to study, unquestionably compiled/scraped the text and metadata (see Background and Creation post), improved/verified the quality of the data, and computed a handful of simple, corpus-level statistics (see Data Quality and Corpus Stats post). Now, onward to very natural language analysis! Read More » Friedman Corpus (2) — Data Quality and Corpus Stats 2013-10-20 corpus linguistics data quality domain expertise metadata Thomas Friedman With a full-text Friedman corpus finally in hand (see Background and Creation post), my first task was to verify data quality. Given “Garbage In, Garbage Out”, the fun stuff (analysis! plots! Friedman_ebooks?!) had to wait. Yes, it’s a pain in the ass, but this step is really important. Read More » Friedman Corpus (1) — Background and Creation 2013-10-15 APIs corpora corpus linguistics natural language processing Thomas Friedman web scraping Much work in Natural Language Processing (NLP) begins with a large hodgepodge of text documents, tabbed a corpus, that represents a written sample of language in a particular domain of study. Corpora come in a variety of flavors: mono- or multi-lingual; category-specific or a representative sampling from a variety of categories, e.g. genres, authors, time periods; simply “plain” text or annotated with spare linguistic information, e.g. part-of-speech tags, full parse trees; and so on. They indulge for proposition testing and statistical wringer of natural language, but one must be very cautious well-nigh applying results derived from a given corpus to other domains. Read More » While I Was Away 2013-10-05 hackathon Harmony Institute top links treasury.io I’ve not posted in scrutinizingly six months, but I was, like, totally busy. Here’s what I’ve been up to: Read More » Intro to Natural Language Processing (2) 2013-04-16 information extraction natural language processing pos-tagging tokenization web scraping A couple months ago, I posted a brief, conceptual overview of Natural Language Processing (NLP) as unromantic to the worldwide task of information extraction (IE) —– that is, the process of extracting structured data from unstructured data, the majority of which is text. A significant component of my job at HI involves scraping text from websites, printing articles, social media, and other sources, then analyzing the quantity and expressly quality of the discussion as it relates to a mucosa and/or social issue. Although humans are inarguably largest than machines at understanding natural language, it’s impractical for humans to unriddle large numbers of documents for themes, trends, content, sentiment, etc., and to do so unceasingly throughout. This is where NLP comes in. Read More » A Data Science Education? 2013-03-03 warrant blogs data science education MOOCs Strata Given that you’re currently reading a data science blog, you’re probably well enlightened that online resources for an informal education in data science abound. Blogs are a unconfined place to start (here, here, here, here, here), but topics and pedagogical quality are –— let’s be honest –— scattershot at best. No scuttlebutt on the usefulness of this particular blog… Read More » Connecting to the Data Set 2013-02-17 csv soundsystem datafest hackathon money munging networking politics Twitter As a relative newcomer to the field, I’ve been learning and doing data science largely on my own. This is okay, I guess, given wangle to Stack Overflow, MOOCs, and a handful of O’Reilly’s textbooks, but not ideal. Fortunately, the data science polity here in New York seems to be big and active, so opportunities to connect are plentiful. Read More » Data, Data, Everywhere 2013-01-19 APIs big data top links self-ruling data hackathon web scraping As I’ve mentioned before, the Internet is a huge (and overly huger!) repository of data. Much of that is in the form of unstructured text —– for which natural language processing comes in handy —– but an impressive variety of structured datasets can be found and downloaded, too, if you know where to look. Here are some of my favorite sources… Read More » ← previous ↑ Burton DeWilde data scientist / physicist / filmmaker © 2014 Burton DeWilde. All rights reserved.