Markovian Retaliation

January 16, 2010

If we assume that the majority of murders are caused by gangs, how would rival gangs retaliate? I am assuming that after a homicide we should see follow up murders around the same zip code. I have animated one year of homicides in ggplot2

Click on the image to see the animated murders

Here is the code. I’d love to hear from you if you have improved this code (not the greatest R code but it serves the purpose)

Efficient Estimation via the Generalized Splitting Method.

January 15, 2010

After sitting in Prof Dirk Kroese‘s presentation I have been thinking about how we can use this paper for our prediction problem. Let me know if you have some ideas.

A Random-Walk on tombstones

January 15, 2010

As we know the year has just started and it is no surprise if you see totally random submissions show up on the leader board (by the way, is John going to award a January cash prize?)

My very first submission used to show up in the 10th place on the leader-board, luckily that embarrassment is now off the leader-board. I know that it was not a very scientific approach but I’d like to share it here with you as I always like to try this approach first.

For the first submission I used an importance sampling technique, meaning that I sample from a probability space according to a distribution. For example if my population is normally distributed with a mean M and variance V, after an infinite number of samples  I should get a population that has mean of M and variance of V. (please look up the correct definition)

It may sound like a joke but it is a very useful technique when you use it on real outcomes, people design AI algorithms and chatbots with this simple method and they hope they win the Turing award someday . One of my predictions is that the team that will win the Turing award will have some kind of an importance sampling method in its core.

One of the advantages of this technique is that it inherits the properties of the original population for example if your original population shows a Markovian behavior; your sampled population will show that too. Therefore without knowing any of the underlying behaviors you will capture all of those behaviors.

Anyways, here is my importance sampling code. I’d like to think that no one will cheat in this competition but theoretically if you make a number of fake accounts and submit the results of this spreadsheet everyday with your many accounts you will have a really high chance of winning (or you can put together a very large team to submit them so it will not be against the rules, a totally Kosher approach). Just press F9 to produce a new submission. (This claim comes from our Darwinian world view that an infinite number of monkeys can reproduce Shakespeare’s work if they randomly tap on their keyboards)

Copyright: You are welcome to use, share or modify this spreadsheet. If you ended up winning the whole competition I appreciate some sort of acknowledgement

Here is the link

GIS in R

January 12, 2010

I used O’reilly’s “Data Mashups in R” to generate the following maps from the homicides information

Homicides Locations (from 2007 to 2009)

Homicides Locations (from 2007 to 2009)

Homicides Heat Map

Homicides Heat Map

Let me know if you want to have the code or the dataset

Cheers,

Siah (www.twitter.com/siah)

I. Homicides as non homogeneous poisson processes

January 11, 2010

**Note: If you are an 18 years old black man living in Philly in July, please just be more careful. You have the highest likelihood of being a victim of a homicide

I am at two of my friends’ house and they are busy so I thought in the meanwhile I do some data modeling with R for Analytics X Prize.

I am still trying to build some intuition about how crimes happen and why. Here you can get some good information about the details of all homicides in the Philly area. I have compiled a long dataset of these information. There is no location information though. But that’s not our focus in our analysis.

And these are some descriptive statistics about the dataset. (It is about those who are killed, victims, not the killers)

Gender Count
Male 1604
Female 205

And some information about the ethnicity of victims

Ethnicity Count
White 319
Black 1451
Asian 36

And number of kills per month

Month Counts
1 132
2 121
3 156
4 157
5 148
6 158
7 172
8 160
9 148
10 168
11 150
12 140

Homicides per month in a chart (do you notice it is kind of periodic?)

Age distribution of the victims:

I tried to model homicides as non homogeneous poisson processes:

Here are my assumptions. I assume homicides are Poisson processes meaning that times between each two occurrences are exponentially distributed. To make it look more sophisticated I assume people commit homicide with different rates depending the time of the year. This is where the non-homogeneous model comes in. Unfortunately the rates are not that different throughput a year. So for the time I just assume that the distribution is uniform throught the year :)

If you are interested in modeling NHPP you can use GLM package in R or send me an email and I will send you my little script

Cheers,
Siah

Hello world!

January 11, 2010

Welcome to WordPress.com. This is your first post. Edit or delete it and start blogging!


Follow

Get every new post delivered to your Inbox.