AI3 Regular Blog


Jan 12, 2013

I've been blogging more than usual since I released AI3 on Christmas Eve. You should check it out. In comparison to all websites I have released, AI3 has the most potential and should get the most respect. I purchased a super-fast server (SSD especially for fast database lookups), leased a super-fast colo space for it, and am going to add to it regularly. As a feature of AI3, I will attempt to keep a regular blog here with insight into what I think about each feature of the website is and then I will make a page with that data on ai3 using a simple slug. I've already done a few if you want to look at the past few blog posts.

The feature that I'm going to discuss today is single-minded research of a single difficult topic. Searching for a common word in Google can be one of the most frustrating things in the world. What you really want is for someone to answer the question you are asking, not learn every way to misunderstand what you are asking. Sometimes AI3 will fail, there's no doubt that Google is more in depth than anything I can create even if I had all of Wikipedia. So let's get in depth on a very simple question. It's not one of the easy questions I've been dealing with. Let's ask: "Is the word 'We' used more positively or negatively?" By that, I mean "Is the sentence 'We plan to solve poverty by 2017,' more common than 'We can not solve poverty by 2017'?" But not just that sentence, but every sentence which is in the positive "We *verb*" vs "We *verb* not". This is a deviously difficult problem. Even with a huge corpus, definitive answers require statistical analysis of a ton of stuff. Let's attempt it though. Start with We and we. All words in AI3 are case-sensitive, which is why there are links to all variants of we on the We word page. 1276 pages is too many unless we have a script. Let's try collocation of We. It's a slow process because We is such a common word. You can look below if you're impatient. While you're waiting, maybe try looking at a few sentences. The second sentence is:

`` We didn't want town work '', Jones said.
Eureka already? Yup. All we need to do is find similar words on We and every word that is in the negative. That's pretty easy, right? There are only four pages of words that contain n't and most of them are pretty uncommon. Note that there's a bug where dashes assume that two words are one. That's a problem with my parser which should be more intelligent about whitespace. So manually or automatically, we can start searching for sentences that contain We didn't and so on. Since the related page doesn't have a count (due to slowness), we are stuck just trying a high page number and using a binary search from there. If you don't know what a binary search is, let me explain. Let's say that there could be upwards of 100 pages of sentences or more. Simply skip to page 100. If it gives you an error, then there aren't that many pages. Go to half that number, page 50. Half the number again and again until you come up a valid page. Then pick a number half way between the valid page and the invalid page. After a few hits, you will find that page 6 is the end of We didn't. In total, it should only take 7 tries to find any number between 1 and 100 because 2^7 is 128. If you don't understand the math, hopefully you'll understand the process. Anyway, now we have a way of counting all the negative sentences. Then we simply need to count all the sentences that contain We. That can be found on the We word page. But let's say that you thought this algorithm through and have some skill with a database. How long would it take you to come up with the solution?

Read more »

A Month after Brasil

It's been a month since I went to Brasil. I am planning on going back, learning as quickly as I can. It's likely that I won't be able to make it back until next winter, but I will plan on it. I need to stay in touch with the friends I made over there. There are many conferences that I can attend to make my stay work-related, but the plane ticket is my main expense. I'm planning on keeping my Brasilian telephone number and giving it to my friends so that they can call me for cheap or free. Of course they can call me on Skype for free as well. We're lucky that we live in such a well-connected society, it's just up to us to stay in touch.

A video I watched today said that Vila Prudente is a favela. I actually visited that neighborhood while I was there and didn't think it was a favela. If that is the definition of a favela, then my eyes deceive me. Certainly the neighborhood may be much poorer than some of the neighborhoods I visited, but it looks quite beautiful (see the street view if you want to know what I mean). Maybe that is the definition of the favela, poverty in a beautiful place. It didn't connect with me that there would be any crime in that neighborhood. The video is about how the residents are getting people involved with documentary films.

What's new with me? Well, since I'm back in Seattle, I may start up yet another blog at blog.altsci.com (not started yet) which will keep a little more info on my day to day and will collect all the other blogs. One problem I have is that I have too many blogs. In one way it's good to separate topics but on the other hand most people who visit my blog are looking for me rather than my topic. I would love to attract more people interested in my subject matter but maybe I should post more subject matter. I can do that.

Read more »

Improvements

Over the weekend I have made massive improvements to my blog. I chose a theme and applied it quite well. I made a logo which is not really a logo but the name of the site using my AltSci font. I'm not too thrilled about the splash part on the main page but I'll figure out what I should do later. The comment system is working (I even caught a bug in Django while I was at it). The quote system is up on the front page and the About page. So what's new on the Brazilian front? I have downloaded 64 PDFs and 72 MP3s from Busuu and am turning them into a study guide for myself. At some point I intend to compile this data from my mind into lessons for English speakers. It could also be used for Brazilian Portuguese speakers to learn English. I'll have to see if I meet anyone who can test it out. Currently my setup is 2 pages and I have enough data for 5-10 pages. I plan to only bring 5 pages with me though. I'll be traveling light to keep only one bag. If I was more confident of where I was going and how I could get there I would probably take a light duffel bag. Having carried it enough trips I don't want that extra weight. Since my wrist is broken I won't be able to switch hands which would be annoying if I carry anything heavy. Limits are not necessarily bad. My website is limited by the time I have available. Though I could have saved time and made it a copy of my other blog, I decided that I need a bit of Python on my website.

Now seems like as good a time as any to advertise my other blogs and my projects. Currently I have a really cool project that is already making some steps. It's also written in Python Django and has about 105GB more data than this blog. Almost none of it is original, but don't let that stop you from visiting Philisophical Transactions. A blog post here wouldn't be complete without a link to my normal blog and my previous travel blogs: AltSci Europe and AltSci Japan.

Read more »

AltSci Cell - About

Cell is place.
Welcome to Cell.

Read more »

« previous next »