bdewilde.github.io









Search Preview

Burton DeWilde

bdewilde.github.io
data scientist / physicist / filmmaker
.io > bdewilde.github.io

SEO audit: Content analysis

Language Error! No language localisation is found.
Title Burton DeWilde
Text / HTML ratio 50 %
Frame Excellent! The website does not use iFrame solutions.
Flash Excellent! The website does not have any flash contents.
Keywords cloud data Read » language natural Friedman text extraction corpus Data processing quality science I’ve Corpus Thomas NLP task information scraping
Keywords consistency
Keyword Content Title Description Headings
data 15
Read 10
» 10
language 9
natural 8
Friedman 8
Headings
H1 H2 H3 H4 H5 H6
10 0 0 0 0 0
Images We found 6 images on this web page.

SEO Keywords (Single)

Keyword Occurrence Density
data 15 0.75 %
Read 10 0.50 %
» 10 0.50 %
language 9 0.45 %
natural 8 0.40 %
Friedman 8 0.40 %
text 7 0.35 %
extraction 6 0.30 %
corpus 6 0.30 %
Data 6 0.30 %
processing 5 0.25 %
quality 5 0.25 %
science 5 0.25 %
I’ve 5 0.25 %
Corpus 5 0.25 %
Thomas 4 0.20 %
NLP 4 0.20 %
task 4 0.20 %
information 4 0.20 %
scraping 4 0.20 %

SEO Keywords (Two Word)

Keyword Occurrence Density
Read More 10 0.50 %
More » 10 0.50 %
natural language 8 0.40 %
language processing 5 0.25 %
data science 5 0.25 %
Thomas Friedman 4 0.20 %
here here 4 0.20 %
Harmony Institute 3 0.15 %
web scraping 3 0.15 %
Natural Language 3 0.15 %
Language Processing 3 0.15 %
Burton DeWilde 3 0.15 %
to the 3 0.15 %
» Friedman 3 0.15 %
Friedman Corpus 3 0.15 %
and Creation 3 0.15 %
Background and 3 0.15 %
variety of 3 0.15 %
corpus linguistics 3 0.15 %
in a 2 0.10 %

SEO Keywords (Three Word)

Keyword Occurrence Density Possible Spam
Read More » 10 0.50 % No
natural language processing 5 0.25 % No
Natural Language Processing 3 0.15 % No
here here here 3 0.15 % No
More » Friedman 3 0.15 % No
» Friedman Corpus 3 0.15 % No
Background and Creation 3 0.15 % No
a handful of 2 0.10 % No
Language Processing NLP 2 0.10 % No
Data Quality and 2 0.10 % No
Quality and Corpus 2 0.10 % No
quality of the 2 0.10 % No
and Corpus Stats 2 0.10 % No
and Creation post 2 0.10 % No
domains Read More 2 0.10 % No
see Background and 2 0.10 % No
a variety of 2 0.10 % No
andor social issue 1 0.05 % No
inarguably better than 1 0.05 % No
understanding natural language 1 0.05 % No

SEO Keywords (Four Word)

Keyword Occurrence Density Possible Spam
More » Friedman Corpus 3 0.15 % No
Read More » Friedman 3 0.15 % No
domains Read More » 2 0.10 % No
Natural Language Processing NLP 2 0.10 % No
Data Quality and Corpus 2 0.10 % No
here here here here 2 0.10 % No
Quality and Corpus Stats 2 0.10 % No
Background and Creation post 2 0.10 % No
see Background and Creation 2 0.10 % No
Burton DeWilde About Me 1 0.05 % No
discussion as it relates 1 0.05 % No
as it relates to 1 0.05 % No
it relates to a 1 0.05 % No
relates to a film 1 0.05 % No
to a film andor 1 0.05 % No
a film andor social 1 0.05 % No
of the discussion as 1 0.05 % No
film andor social issue 1 0.05 % No
andor social issue Although 1 0.05 % No
social issue Although humans 1 0.05 % No

Internal links in - bdewilde.github.io

About Me
About Me
Archive
Archive
Intro to Automatic Keyphrase Extraction
Intro to Automatic Keyphrase Extraction
On Starting Over with Jekyll
On Starting Over with Jekyll
Friedman Corpus (3) — Occurrence and Dispersion
Friedman Corpus (3) — Occurrence and Dispersion
Background and Creation
Friedman Corpus (1) — Background and Creation
Data Quality and Corpus Stats
Friedman Corpus (2) — Data Quality and Corpus Stats
While I Was Away
While I Was Away
Intro to Natural Language Processing (2)
Intro to Natural Language Processing (2)
a brief, conceptual overview
Intro to Natural Language Processing (1)
A Data Science Education?
A Data Science Education?
Connecting to the Data Set
Connecting to the Data Set
Data, Data, Everywhere
Data, Data, Everywhere
← previous
Burton DeWilde

Bdewilde.github.io Spined HTML


Burton DeWilde Burton DeWildeWell-nighMe Archive CV Intro toWill-lessKeyphrase Extraction 2014-09-23 full-length diamond frequency statistics keyphrase extraction graph-based ranking NLP task reformulation I often wield natural language processing for purposes of automatically extracting structured information from unstructured (text) datasets. One such task is the extraction of important topical words and phrases from documents, wontedly known as terminology extraction or will-less keyphrase extraction. Keyphrases provide a transitory unravelment of a document’s content; they are useful for document categorization, clustering, indexing, search, and summarization; quantifying semantic similarity with other documents; as well as conceptualizing particular knowledge domains. Read More » On Starting Over with Jekyll 2014-08-10 blogging DataKind Disqus Harmony Institute Jekyll website diamond After flipside lengthy hiatus from blogging, I’m back! Long story short, I got so frustrated with Blogger’s shortcomings and complications, not to mention the unstipulated lack of tenancy over my content, that I lost the will to update my old blog. At the same time, I was putting in longer hours at Harmony Institute and volunteering on the side for DataKind, so I didn’t have much to say outside of official channels. That said, my data life has not gone entirely un-blogged: Read More » Friedman Corpus (3) — Occurrence and Dispersion 2013-11-03 corpus linguistics dispersion natural language processing occurrence Thomas Friedman Thus far, I’ve pseudo-justified why a hodgepodge of NYT wares by Thomas Friedman would be interesting to study, unquestionably compiled/scraped the text and metadata (see Background and Creation post), improved/verified the quality of the data, and computed a handful of simple, corpus-level statistics (see Data Quality and Corpus Stats post). Now, onward to very natural language analysis! Read More » Friedman Corpus (2) — Data Quality and Corpus Stats 2013-10-20 corpus linguistics data quality domain expertise metadata Thomas Friedman With a full-text Friedman corpus finally in hand (see Background and Creation post), my first task was to verify data quality. Given “Garbage In, Garbage Out”, the fun stuff (analysis! plots! Friedman_ebooks?!) had to wait. Yes, it’s a pain in the ass, but this step is really important. Read More » Friedman Corpus (1) — Background and Creation 2013-10-15 APIs corpora corpus linguistics natural language processing Thomas Friedman web scraping Much work in Natural Language Processing (NLP) begins with a large hodgepodge of text documents, tabbed a corpus, that represents a written sample of language in a particular domain of study. Corpora come in a variety of flavors: mono- or multi-lingual; category-specific or a representative sampling from a variety of categories, e.g. genres, authors, time periods; simply “plain” text or annotated with spare linguistic information, e.g. part-of-speech tags, full parse trees; and so on. They indulge for proposition testing and statistical wringer of natural language, but one must be very cautious well-nigh applying results derived from a given corpus to other domains. Read More » While I Was Away 2013-10-05 hackathon Harmony Institute top links treasury.io I’ve not posted in scrutinizingly six months, but I was, like, totally busy. Here’s what I’ve been up to: Read More » Intro to Natural Language Processing (2) 2013-04-16 information extraction natural language processing pos-tagging tokenization web scraping A couple months ago, I posted a brief, conceptual overview of Natural Language Processing (NLP) as unromantic to the worldwide task of information extraction (IE) —– that is, the process of extracting structured data from unstructured data, the majority of which is text. A significant component of my job at HI involves scraping text from websites, printing articles, social media, and other sources, then analyzing the quantity and expressly quality of the discussion as it relates to a mucosa and/or social issue. Although humans are inarguably largest than machines at understanding natural language, it’s impractical for humans to unriddle large numbers of documents for themes, trends, content, sentiment, etc., and to do so unceasingly throughout. This is where NLP comes in. Read More » A Data Science Education? 2013-03-03 warrant blogs data science education MOOCs Strata Given that you’re currently reading a data science blog, you’re probably well enlightened that online resources for an informal education in data science abound. Blogs are a unconfined place to start (here, here, here, here, here), but topics and pedagogical quality are –— let’s be honest –— scattershot at best. No scuttlebutt on the usefulness of this particular blog… Read More » Connecting to the Data Set 2013-02-17 csv soundsystem datafest hackathon money munging networking politics Twitter As a relative newcomer to the field, I’ve been learning and doing data science largely on my own. This is okay, I guess, given wangle to Stack Overflow, MOOCs, and a handful of O’Reilly’s textbooks, but not ideal. Fortunately, the data science polity here in New York seems to be big and active, so opportunities to connect are plentiful. Read More » Data, Data, Everywhere 2013-01-19 APIs big data top links self-ruling data hackathon web scraping As I’ve mentioned before, the Internet is a huge (and overly huger!) repository of data. Much of that is in the form of unstructured text —– for which natural language processing comes in handy —– but an impressive variety of structured datasets can be found and downloaded, too, if you know where to look. Here are some of my favorite sources… Read More » ← previous ↑ Burton DeWilde data scientist / physicist / filmmaker © 2014 Burton DeWilde. All rights reserved.