bdewilde.github.io - Data, Data, Everywhere









Search Preview

Data, Data, Everywhere

bdewilde.github.io
data scientist / physicist / filmmaker
.io > bdewilde.github.io

SEO audit: Content analysis

Language Error! No language localisation is found.
Title Data, Data, Everywhere
Text / HTML ratio 51 %
Frame Excellent! The website does not use iFrame solutions.
Flash Excellent! The website does not have any flash contents.
Keywords cloud data access APIs datasets list Burton —– API DeWilde government free Data web scraping training Infochimps downloaded occasional big structured
Keywords consistency
Keyword Content Title Description Headings
data 15
access 7
APIs 7
datasets 6
list 4
Burton 3
Headings
H1 H2 H3 H4 H5 H6
1 0 0 0 0 0
Images We found 0 images on this web page.

SEO Keywords (Single)

Keyword Occurrence Density
data 15 0.75 %
access 7 0.35 %
APIs 7 0.35 %
datasets 6 0.30 %
list 4 0.20 %
Burton 3 0.15 %
—– 3 0.15 %
API 3 0.15 %
DeWilde 3 0.15 %
government 3 0.15 %
free 3 0.15 %
Data 3 0.15 %
web 2 0.10 %
scraping 2 0.10 %
training 2 0.10 %
Infochimps 2 0.10 %
downloaded 2 0.10 %
occasional 2 0.10 %
big 2 0.10 %
structured 2 0.10 %

SEO Keywords (Two Word)

Keyword Occurrence Density
of the 3 0.15 %
to access 3 0.15 %
can be 3 0.15 %
Burton DeWilde 2 0.10 %
to a 2 0.10 %
of data 2 0.10 %
on the 2 0.10 %
provide a 2 0.10 %
data and 2 0.10 %
in a 2 0.10 %
access the 2 0.10 %
APIs to 2 0.10 %
from the 2 0.10 %
structured datasets 2 0.10 %
datasets can 2 0.10 %
and occasional 2 0.10 %
you can 2 0.10 %
to get 2 0.10 %
access to 2 0.10 %
a great 2 0.10 %

SEO Keywords (Three Word)

Keyword Occurrence Density Possible Spam
datasets can be 2 0.10 % No
APIs to access 2 0.10 % No
Burton DeWilde About 1 0.05 % No
nowhere near a 1 0.05 % No
interesting Obviously this 1 0.05 % No
Obviously this is 1 0.05 % No
this is nowhere 1 0.05 % No
is nowhere near 1 0.05 % No
near a comprehensive 1 0.05 % No
to something interesting 1 0.05 % No
a comprehensive list 1 0.05 % No
comprehensive list and 1 0.05 % No
list and many 1 0.05 % No
and many others 1 0.05 % No
many others have 1 0.05 % No
others have made 1 0.05 % No
have made longerbetter 1 0.05 % No
something interesting Obviously 1 0.05 % No
occasional pointers to 1 0.05 % No
pointers to something 1 0.05 % No

SEO Keywords (Four Word)

Keyword Occurrence Density Possible Spam
Burton DeWilde About Me 1 0.05 % No
nowhere near a comprehensive 1 0.05 % No
to something interesting Obviously 1 0.05 % No
something interesting Obviously this 1 0.05 % No
interesting Obviously this is 1 0.05 % No
Obviously this is nowhere 1 0.05 % No
this is nowhere near 1 0.05 % No
is nowhere near a 1 0.05 % No
near a comprehensive list 1 0.05 % No
occasional pointers to something 1 0.05 % No
a comprehensive list and 1 0.05 % No
comprehensive list and many 1 0.05 % No
list and many others 1 0.05 % No
and many others have 1 0.05 % No
many others have made 1 0.05 % No
others have made longerbetter 1 0.05 % No
have made longerbetter lists 1 0.05 % No
pointers to something interesting 1 0.05 % No
and occasional pointers to 1 0.05 % No
longerbetter lists of their 1 0.05 % No

Internal links in - bdewilde.github.io

About Me
About Me
Archive
Archive
Intro to Automatic Keyphrase Extraction
Intro to Automatic Keyphrase Extraction
On Starting Over with Jekyll
On Starting Over with Jekyll
Friedman Corpus (3) — Occurrence and Dispersion
Friedman Corpus (3) — Occurrence and Dispersion
Background and Creation
Friedman Corpus (1) — Background and Creation
Data Quality and Corpus Stats
Friedman Corpus (2) — Data Quality and Corpus Stats
While I Was Away
While I Was Away
Intro to Natural Language Processing (2)
Intro to Natural Language Processing (2)
a brief, conceptual overview
Intro to Natural Language Processing (1)
A Data Science Education?
A Data Science Education?
Connecting to the Data Set
Connecting to the Data Set
Data, Data, Everywhere
Data, Data, Everywhere
← previous
Burton DeWilde

Bdewilde.github.io Spined HTML


Data, Data, Everywhere Burton DeWilde About MeGazetteerCV Data, Data, Everywhere 2013-01-19 APIs big data top links self-ruling data hackathon web scraping As I’ve mentioned before, the Internet is a huge (and overly huger!) repository of data. Much of that is in the form of unstructured text —– for which natural language processing comes in handy —– but an impressive variety of structured datasets can be found and downloaded, too, if you know where to look. Here are some of my favorite sources… Data.gov:Self-rulingaccess to thousands of datasets maintained by the U.S. Federal government, from White House visitor logs to cantaloupe statistics (seriously), with pretty good search functionality. Dozens of other national governments offer something similar, e.g. data.gov.uk, as does the United Nations at UNdata. Census.gov: The U.S. Census Bureau offers self-ruling wangle to a wealth of demographics data, from the wholesale decennial census to detailed American Community Surveys. Check out American FactFinder for fancy filtering that enables you to inspect very specific slices of the population. Infochimps: In wing to providing a platform for big data analysis, Infochimps maintains a “data marketplace” where thousands of self-ruling and paid datasets can be downloaded or accessed via API, from a Twitter census to a corpus of several thousand erotica stories (for NLP training, of course…). Datamob: “Public data put to good use.” Aggregates hundreds of data sources (plus apps and unstipulated resources) with tags and descriptions, tent sports, government, media, science, etc. Check out the tags list. The New York Times: Provides tens of APIs to wangle the paper’s wide-stretching vendible archives, Congressional records, NYC real manor sales data, etc. They plane provide a handy API Tool to test out your queries. The Guardian: Besides having a unconfined data blog, they moreover make all of the data misogynist to the public! Here’s a full list of their datasets tent a wide range of topics. Sunlight Labs: Data arm of Sunlight Foundation, defended to government peccancy and transparency. Provides several APIs to wangle data on state legislatures, wayfarers contributions, and the words unquestionably spoken on the record in Congress. As it turns out, they’re hosting a hackathon in a couple weeks, and I’ll be there! :) Reddit: Hosts a datasets gazetteer filled with the sort of thing you might expect from the front page/gutter of the internet: 10,000 images of cats (for training a classifier…?), people with questions and trolls with answers, and occasional pointers to something interesting. Obviously, this is nowhere near a comprehensive list, and many others have made longer/better lists of their own, e.g. famous data scientists Peter Skomoroch and Hilary Mason. If you find yourself looking for but unable to find a particular dataset, Google is your mostly-not-evil friend. In my list I mentioned a few APIs —– Application Programming Interfaces –— which, I should note, are unshared from structured datasets ready for download. Web APIs provide a relatively resulting and stable waterworks to wangle a website’s data and return it in a standardized format like XML or JSON. If washed-up well, APIs can be a unconfined help, although they do have some drawbacks: registration to get an official wangle key, limits on the rate at which you can wangle the data, and occasional clamp-downs. Still, it’s nice to have options; to get you going, here’s a massive API directory and a brand-new Codecademy learning track specifically on APIs in Javascript, Python, and Ruby. And when all else fails, you can unchangingly fall when on web scraping. In fact, some prefer it that way… The data’s out there –— happy fetching! :) ← previous ↑ next → Please enable JavaScript to view the comments powered by Disqus. comments powered by Disqus Burton DeWilde data scientist / physicist / filmmaker © 2014 Burton DeWilde. All rights reserved.