bdewilde.github.io
Burton DeWilde

Search Preview

Burton DeWilde

bdewilde.github.io
data scientist / physicist / filmmaker
.io > bdewilde.github.io

SEO audit: Content analysis

Language

Error! No language localisation is found.

Title

Burton DeWilde

Text / HTML ratio

50 %

Frame

Excellent! The website does not use iFrame solutions.

Flash

Excellent! The website does not have any flash contents.

Keywords cloud

data Read » language natural Friedman text extraction corpus Data processing quality science I’ve Corpus Thomas NLP task information scraping

Keywords consistency

Keyword	Content	Title	Description	Headings
data	15
Read	10
»	10
language	9
natural	8
Friedman	8

Headings

H1	H2	H3	H4	H5	H6
10	0	0	0	0	0

Images

We found 6 images on this web page.

SEO Keywords (Single)

Keyword	Occurrence	Density
data	15	0.75 %
Read	10	0.50 %
»	10	0.50 %
language	9	0.45 %
natural	8	0.40 %
Friedman	8	0.40 %
text	7	0.35 %
extraction	6	0.30 %
corpus	6	0.30 %
Data	6	0.30 %
processing	5	0.25 %
quality	5	0.25 %
science	5	0.25 %
I’ve	5	0.25 %
Corpus	5	0.25 %
Thomas	4	0.20 %
NLP	4	0.20 %
task	4	0.20 %
information	4	0.20 %
scraping	4	0.20 %

SEO Keywords (Two Word)

Keyword	Occurrence	Density
Read More	10	0.50 %
More »	10	0.50 %
natural language	8	0.40 %
language processing	5	0.25 %
data science	5	0.25 %
Thomas Friedman	4	0.20 %
here here	4	0.20 %
Harmony Institute	3	0.15 %
web scraping	3	0.15 %
Natural Language	3	0.15 %
Language Processing	3	0.15 %
Burton DeWilde	3	0.15 %
to the	3	0.15 %
» Friedman	3	0.15 %
Friedman Corpus	3	0.15 %
and Creation	3	0.15 %
Background and	3	0.15 %
variety of	3	0.15 %
corpus linguistics	3	0.15 %
in a	2	0.10 %

SEO Keywords (Three Word)

Keyword	Occurrence	Density	Possible Spam
Read More »	10	0.50 %	No
natural language processing	5	0.25 %	No
Natural Language Processing	3	0.15 %	No
here here here	3	0.15 %	No
More » Friedman	3	0.15 %	No
» Friedman Corpus	3	0.15 %	No
Background and Creation	3	0.15 %	No
a handful of	2	0.10 %	No
Language Processing NLP	2	0.10 %	No
Data Quality and	2	0.10 %	No
Quality and Corpus	2	0.10 %	No
quality of the	2	0.10 %	No
and Corpus Stats	2	0.10 %	No
and Creation post	2	0.10 %	No
domains Read More	2	0.10 %	No
see Background and	2	0.10 %	No
a variety of	2	0.10 %	No
andor social issue	1	0.05 %	No
inarguably better than	1	0.05 %	No
understanding natural language	1	0.05 %	No

SEO Keywords (Four Word)

Keyword	Occurrence	Density	Possible Spam
More » Friedman Corpus	3	0.15 %	No
Read More » Friedman	3	0.15 %	No
domains Read More »	2	0.10 %	No
Natural Language Processing NLP	2	0.10 %	No
Data Quality and Corpus	2	0.10 %	No
here here here here	2	0.10 %	No
Quality and Corpus Stats	2	0.10 %	No
Background and Creation post	2	0.10 %	No
see Background and Creation	2	0.10 %	No
Burton DeWilde About Me	1	0.05 %	No
discussion as it relates	1	0.05 %	No
as it relates to	1	0.05 %	No
it relates to a	1	0.05 %	No
relates to a film	1	0.05 %	No
to a film andor	1	0.05 %	No
a film andor social	1	0.05 %	No
of the discussion as	1	0.05 %	No
film andor social issue	1	0.05 %	No
andor social issue Although	1	0.05 %	No
social issue Although humans	1	0.05 %	No

Internal links in - bdewilde.github.io

About Me
About Me

Archive
Archive

Intro to Automatic Keyphrase Extraction
Intro to Automatic Keyphrase Extraction

On Starting Over with Jekyll
On Starting Over with Jekyll

Friedman Corpus (3) — Occurrence and Dispersion
Friedman Corpus (3) — Occurrence and Dispersion

Background and Creation
Friedman Corpus (1) — Background and Creation

Data Quality and Corpus Stats
Friedman Corpus (2) — Data Quality and Corpus Stats

While I Was Away
While I Was Away

Intro to Natural Language Processing (2)
Intro to Natural Language Processing (2)

a brief, conceptual overview
Intro to Natural Language Processing (1)

A Data Science Education?
A Data Science Education?

Connecting to the Data Set
Connecting to the Data Set

Data, Data, Everywhere
Data, Data, Everywhere

← previous
Burton DeWilde

Bdewilde.github.io Spined HTML

Burton DeWilde Burton DeWildeWell-nighMe Archive CV Intro toWill-lessKeyphrase Extraction 2014-09-23 full-length diamond frequency statistics keyphrase extraction graph-based ranking NLP task reformulation I often wield natural language processing for purposes of automatically extracting structured information from unstructured (text) datasets. One such task is the extraction of important topical words and phrases from documents, wontedly known as terminology extraction or will-less keyphrase extraction. Keyphrases provide a transitory unravelment of a document’s content; they are useful for document categorization, clustering, indexing, search, and summarization; quantifying semantic similarity with other documents; as well as conceptualizing particular knowledge domains. Read More » On Starting Over with Jekyll 2014-08-10 blogging DataKind Disqus Harmony Institute Jekyll website diamond After flipside lengthy hiatus from blogging, I’m back! Long story short, I got so frustrated with Blogger’s shortcomings and complications, not to mention the unstipulated lack of tenancy over my content, that I lost the will to update my old blog. At the same time, I was putting in longer hours at Harmony Institute and volunteering on the side for DataKind, so I didn’t have much to say outside of official channels. That said, my data life has not gone entirely un-blogged: Read More » Friedman Corpus (3) — Occurrence and Dispersion 2013-11-03 corpus linguistics dispersion natural language processing occurrence Thomas Friedman Thus far, I’ve pseudo-justified why a hodgepodge of NYT wares by Thomas Friedman would be interesting to study, unquestionably compiled/scraped the text and metadata (see Background and Creation post), improved/verified the quality of the data, and computed a handful of simple, corpus-level statistics (see Data Quality and Corpus Stats post). Now, onward to very natural language analysis! Read More » Friedman Corpus (2) — Data Quality and Corpus Stats 2013-10-20 corpus linguistics data quality domain expertise metadata Thomas Friedman With a full-text Friedman corpus finally in hand (see Background and Creation post), my first task was to verify data quality. Given “Garbage In, Garbage Out”, the fun stuff (analysis! plots! Friedman_ebooks?!) had to wait. Yes, it’s a pain in the ass, but this step is really important. Read More » Friedman Corpus (1) — Background and Creation 2013-10-15 APIs corpora corpus linguistics natural language processing Thomas Friedman web scraping Much work in Natural Language Processing (NLP) begins with a large hodgepodge of text documents, tabbed a corpus, that represents a written sample of language in a particular domain of study. Corpora come in a variety of flavors: mono- or multi-lingual; category-specific or a representative sampling from a variety of categories, e.g. genres, authors, time periods; simply “plain” text or annotated with spare linguistic information, e.g. part-of-speech tags, full parse trees; and so on. They indulge for proposition testing and statistical wringer of natural language, but one must be very cautious well-nigh applying results derived from a given corpus to other domains. Read More » While I Was Away 2013-10-05 hackathon Harmony Institute top links treasury.io I’ve not posted in scrutinizingly six months, but I was, like, totally busy. Here’s what I’ve been up to: Read More » Intro to Natural Language Processing (2) 2013-04-16 information extraction natural language processing pos-tagging tokenization web scraping A couple months ago, I posted a brief, conceptual overview of Natural Language Processing (NLP) as unromantic to the worldwide task of information extraction (IE) —– that is, the process of extracting structured data from unstructured data, the majority of which is text. A significant component of my job at HI involves scraping text from websites, printing articles, social media, and other sources, then analyzing the quantity and expressly quality of the discussion as it relates to a mucosa and/or social issue. Although humans are inarguably largest than machines at understanding natural language, it’s impractical for humans to unriddle large numbers of documents for themes, trends, content, sentiment, etc., and to do so unceasingly throughout. This is where NLP comes in. Read More » A Data Science Education? 2013-03-03 warrant blogs data science education MOOCs Strata Given that you’re currently reading a data science blog, you’re probably well enlightened that online resources for an informal education in data science abound. Blogs are a unconfined place to start (here, here, here, here, here), but topics and pedagogical quality are –— let’s be honest –— scattershot at best. No scuttlebutt on the usefulness of this particular blog… Read More » Connecting to the Data Set 2013-02-17 csv soundsystem datafest hackathon money munging networking politics Twitter As a relative newcomer to the field, I’ve been learning and doing data science largely on my own. This is okay, I guess, given wangle to Stack Overflow, MOOCs, and a handful of O’Reilly’s textbooks, but not ideal. Fortunately, the data science polity here in New York seems to be big and active, so opportunities to connect are plentiful. Read More » Data, Data, Everywhere 2013-01-19 APIs big data top links self-ruling data hackathon web scraping As I’ve mentioned before, the Internet is a huge (and overly huger!) repository of data. Much of that is in the form of unstructured text —– for which natural language processing comes in handy —– but an impressive variety of structured datasets can be found and downloaded, too, if you know where to look. Here are some of my favorite sources… Read More » ← previous ↑ Burton DeWilde data scientist / physicist / filmmaker © 2014 Burton DeWilde. All rights reserved.

bdewilde.github.ioBurton DeWilde

Search Preview

Burton DeWilde

SEO audit: Content analysis

SEO Keywords (Single)

SEO Keywords (Two Word)

SEO Keywords (Three Word)

SEO Keywords (Four Word)

Internal links in - bdewilde.github.io

Bdewilde.github.io Spined HTML

bdewilde.github.io
Burton DeWilde