![]() ![]() ![]() It gets even more interesting when you have datasets for other parts of the internet. Or when people started entering definitions for celebrities, and what those celebrities were defined as. Then check how the trends to use innocent words for sex acts and to use sex related words for innocent things compare over time. I highly suspect that a lot of cultural insights are hidden in that data.Īs a simple example, you can try to automatically tag each definition with categories for the word and categories for the definition. Please post a link in the comments if you've made another mirror. Also, it's a 7z file, which will be a bit weird for some, but there are freeware/open-source extractors for it on every major platform - just head over to google. If someone could upload mirrors or make a torrent out of it and post it in the comments that'd be great because I won't be able to personally host it for more than a couple of months. Best bet would be to start one month prior (April), and just ignore if it's already in the DB, because I'm not exactly sure what date in May I finished. You can scrape word ids from here: (note that each date has many pages) and then just throw them into the urbandictionary api link above. Unfortunately it's missing the last year of data, because it was scraped in May 2016, but perhaps someone will be able to grab the last year's worth and throw them in the comments if there's enough interest in this dataset. You can see that same definition with the API here: Note that the _id property was added by me and can be disregarded. You can use tail -f words.json if you're on linux (or mac?) to have a look yourself once you've extracted the file. Here's the last line of the file to give you an idea of the structure: I've been meaning to post this for a while - hopefully someone manages to do something cool with it! Posting anonymously because I'm not sure how protective the UD founders are, but I think they'd be cool the the data science community playing around with it.Įach word is on it's own line.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |