tl;dr Using a Naïve Bayesian classifier and a dataset of 1515 video game ratings, I am predicting which developer is most likely to make a game with specific properties (metascore, ESRB rating, genre, platform) in the future. Naïve Bayesian learning A Naïve Bayes classifier is a very simple method to predict categorial outcomes. A well-known application is text classification, especially predicting whether a text is spam. Here, the classifier tries to use the information about the occurrence of certain words in telling us whether an e-mail message is spam or not.
Let’s make this a quick and quite basic one. There is this incredibly useful function in R called ifelse(). It’s basically a vectorized version of an if … else control structure every programming language has in one way or the other. ifelse() has, in my view, two major advantages over if … else: It’s super fast. It’s more convenient to use. The basic idea is that you have a vector of values and whenever you want to test these values against some kind of condition, you want to have a specific value in another vector.
There is a game I play / try to make myself okay / try so hard to make the pieces all fit / smash it apart / just for the f**k of it (Nine Inch Nails: The Big Come Down) After this rather distressing opening by the Nine Inch Nails, let’s turn to a more uplifting topic: video games! There is a dataset on Kaggle with ratings for over 4000 video games.
What we want to do Recently, I used a correspondence analysis from the ca package in a paper. All of the figures in the paper were done with ggplot. So, I wanted the visualization for the correspondence analysis to match the style of the other figures. The standard plot method plot.ca() however, produces base graphics plots. So, I had to create the ggplot visualization myself. Actually, I don’t know if there are any packages that take a ca object (created by the ca package) and produce ggplots from it.
What I will show you In this post, I want to show you a few ways how you can save your datasets in R. Maybe, this seems like a dumb question to you. But after giving quite a few R courses mainly - but not only - for R beginners, I came to acknowledge that the answer to this question is not obvious and the different possibilites can be confusing.