bdewilde.github.io - Data, Data, Everywhere
Data, Data, Everywhere

Search Preview

Data, Data, Everywhere

bdewilde.github.io
data scientist / physicist / filmmaker
.io > bdewilde.github.io

SEO audit: Content analysis

Language

Error! No language localisation is found.

Title

Data, Data, Everywhere

Text / HTML ratio

51 %

Frame

Excellent! The website does not use iFrame solutions.

Flash

Excellent! The website does not have any flash contents.

Keywords cloud

data access APIs datasets list Burton —– API DeWilde government free Data web scraping training Infochimps downloaded occasional big structured

Keywords consistency

Keyword	Content	Title	Description	Headings
data	15
access	7
APIs	7
datasets	6
list	4
Burton	3

Headings

H1	H2	H3	H4	H5	H6
1	0	0	0	0	0

Images

We found 0 images on this web page.

SEO Keywords (Single)

Keyword	Occurrence	Density
data	15	0.75 %
access	7	0.35 %
APIs	7	0.35 %
datasets	6	0.30 %
list	4	0.20 %
Burton	3	0.15 %
—–	3	0.15 %
API	3	0.15 %
DeWilde	3	0.15 %
government	3	0.15 %
free	3	0.15 %
Data	3	0.15 %
web	2	0.10 %
scraping	2	0.10 %
training	2	0.10 %
Infochimps	2	0.10 %
downloaded	2	0.10 %
occasional	2	0.10 %
big	2	0.10 %
structured	2	0.10 %

SEO Keywords (Two Word)

Keyword	Occurrence	Density
of the	3	0.15 %
to access	3	0.15 %
can be	3	0.15 %
Burton DeWilde	2	0.10 %
to a	2	0.10 %
of data	2	0.10 %
on the	2	0.10 %
provide a	2	0.10 %
data and	2	0.10 %
in a	2	0.10 %
access the	2	0.10 %
APIs to	2	0.10 %
from the	2	0.10 %
structured datasets	2	0.10 %
datasets can	2	0.10 %
and occasional	2	0.10 %
you can	2	0.10 %
to get	2	0.10 %
access to	2	0.10 %
a great	2	0.10 %

SEO Keywords (Three Word)

Keyword	Occurrence	Density	Possible Spam
datasets can be	2	0.10 %	No
APIs to access	2	0.10 %	No
Burton DeWilde About	1	0.05 %	No
nowhere near a	1	0.05 %	No
interesting Obviously this	1	0.05 %	No
Obviously this is	1	0.05 %	No
this is nowhere	1	0.05 %	No
is nowhere near	1	0.05 %	No
near a comprehensive	1	0.05 %	No
to something interesting	1	0.05 %	No
a comprehensive list	1	0.05 %	No
comprehensive list and	1	0.05 %	No
list and many	1	0.05 %	No
and many others	1	0.05 %	No
many others have	1	0.05 %	No
others have made	1	0.05 %	No
have made longerbetter	1	0.05 %	No
something interesting Obviously	1	0.05 %	No
occasional pointers to	1	0.05 %	No
pointers to something	1	0.05 %	No

SEO Keywords (Four Word)

Keyword	Occurrence	Density	Possible Spam
Burton DeWilde About Me	1	0.05 %	No
nowhere near a comprehensive	1	0.05 %	No
to something interesting Obviously	1	0.05 %	No
something interesting Obviously this	1	0.05 %	No
interesting Obviously this is	1	0.05 %	No
Obviously this is nowhere	1	0.05 %	No
this is nowhere near	1	0.05 %	No
is nowhere near a	1	0.05 %	No
near a comprehensive list	1	0.05 %	No
occasional pointers to something	1	0.05 %	No
a comprehensive list and	1	0.05 %	No
comprehensive list and many	1	0.05 %	No
list and many others	1	0.05 %	No
and many others have	1	0.05 %	No
many others have made	1	0.05 %	No
others have made longerbetter	1	0.05 %	No
have made longerbetter lists	1	0.05 %	No
pointers to something interesting	1	0.05 %	No
and occasional pointers to	1	0.05 %	No
longerbetter lists of their	1	0.05 %	No

Internal links in - bdewilde.github.io

About Me
About Me

Archive
Archive

Intro to Automatic Keyphrase Extraction
Intro to Automatic Keyphrase Extraction

On Starting Over with Jekyll
On Starting Over with Jekyll

Friedman Corpus (3) — Occurrence and Dispersion
Friedman Corpus (3) — Occurrence and Dispersion

Background and Creation
Friedman Corpus (1) — Background and Creation

Data Quality and Corpus Stats
Friedman Corpus (2) — Data Quality and Corpus Stats

While I Was Away
While I Was Away

Intro to Natural Language Processing (2)
Intro to Natural Language Processing (2)

a brief, conceptual overview
Intro to Natural Language Processing (1)

A Data Science Education?
A Data Science Education?

Connecting to the Data Set
Connecting to the Data Set

Data, Data, Everywhere
Data, Data, Everywhere

← previous
Burton DeWilde

Bdewilde.github.io Spined HTML

Data, Data, Everywhere Burton DeWilde About MeGazetteerCV Data, Data, Everywhere 2013-01-19 APIs big data top links self-ruling data hackathon web scraping As I’ve mentioned before, the Internet is a huge (and overly huger!) repository of data. Much of that is in the form of unstructured text —– for which natural language processing comes in handy —– but an impressive variety of structured datasets can be found and downloaded, too, if you know where to look. Here are some of my favorite sources… Data.gov:Self-rulingaccess to thousands of datasets maintained by the U.S. Federal government, from White House visitor logs to cantaloupe statistics (seriously), with pretty good search functionality. Dozens of other national governments offer something similar, e.g. data.gov.uk, as does the United Nations at UNdata. Census.gov: The U.S. Census Bureau offers self-ruling wangle to a wealth of demographics data, from the wholesale decennial census to detailed American Community Surveys. Check out American FactFinder for fancy filtering that enables you to inspect very specific slices of the population. Infochimps: In wing to providing a platform for big data analysis, Infochimps maintains a “data marketplace” where thousands of self-ruling and paid datasets can be downloaded or accessed via API, from a Twitter census to a corpus of several thousand erotica stories (for NLP training, of course…). Datamob: “Public data put to good use.” Aggregates hundreds of data sources (plus apps and unstipulated resources) with tags and descriptions, tent sports, government, media, science, etc. Check out the tags list. The New York Times: Provides tens of APIs to wangle the paper’s wide-stretching vendible archives, Congressional records, NYC real manor sales data, etc. They plane provide a handy API Tool to test out your queries. The Guardian: Besides having a unconfined data blog, they moreover make all of the data misogynist to the public! Here’s a full list of their datasets tent a wide range of topics. Sunlight Labs: Data arm of Sunlight Foundation, defended to government peccancy and transparency. Provides several APIs to wangle data on state legislatures, wayfarers contributions, and the words unquestionably spoken on the record in Congress. As it turns out, they’re hosting a hackathon in a couple weeks, and I’ll be there! :) Reddit: Hosts a datasets gazetteer filled with the sort of thing you might expect from the front page/gutter of the internet: 10,000 images of cats (for training a classifier…?), people with questions and trolls with answers, and occasional pointers to something interesting. Obviously, this is nowhere near a comprehensive list, and many others have made longer/better lists of their own, e.g. famous data scientists Peter Skomoroch and Hilary Mason. If you find yourself looking for but unable to find a particular dataset, Google is your mostly-not-evil friend. In my list I mentioned a few APIs —– Application Programming Interfaces –— which, I should note, are unshared from structured datasets ready for download. Web APIs provide a relatively resulting and stable waterworks to wangle a website’s data and return it in a standardized format like XML or JSON. If washed-up well, APIs can be a unconfined help, although they do have some drawbacks: registration to get an official wangle key, limits on the rate at which you can wangle the data, and occasional clamp-downs. Still, it’s nice to have options; to get you going, here’s a massive API directory and a brand-new Codecademy learning track specifically on APIs in Javascript, Python, and Ruby. And when all else fails, you can unchangingly fall when on web scraping. In fact, some prefer it that way… The data’s out there –— happy fetching! :) ← previous ↑ next → Please enable JavaScript to view the comments powered by Disqus. comments powered by Disqus Burton DeWilde data scientist / physicist / filmmaker © 2014 Burton DeWilde. All rights reserved.

bdewilde.github.io - Data, Data, EverywhereData, Data, Everywhere

Search Preview

Data, Data, Everywhere

SEO audit: Content analysis

SEO Keywords (Single)

SEO Keywords (Two Word)

SEO Keywords (Three Word)

SEO Keywords (Four Word)

Internal links in - bdewilde.github.io

Bdewilde.github.io Spined HTML

bdewilde.github.io - Data, Data, Everywhere
Data, Data, Everywhere