R Questions Tag Pairs on Stackoverflow
Months ago, I passed by R Questions from Stack Overflow published on Kaggle. I was interested in tag pairs in particular, i.e. which tags appear together in R questions, so I worked on this simple kernel.
Months ago, I passed by R Questions from Stack Overflow published on Kaggle. I was interested in tag pairs in particular, i.e. which tags appear together in R questions, so I worked on this simple kernel.
Last February, I read about prophet package, which was released by Facebook’s Core Data Science team. I skimmed the published paper Forecasting at Scale quickly and I got the main concept. I also liked how the creators; Sean Taylor and Ben Letham were trying to empower analysts to produce high quality forecasts by offering them a flexible and configurable model that requires general understanding, but not necessarily deep knowledge about time series models.
A couple of weeks ago, I had a discussion with a co-worker regarding a project I was involved in, I felt that there was no clear understanding of the daily challenges data scientists face. Few days later, I was at Rstudio::Conf 2017 where I met lots of data scientists from academia and industry. Later on, I described one of the conference’s positive side effects as “group therapy”, where one could see how others face the same challenges and struggle with similar issues. So it was like a personal sanity check!
Everyday statisticians, analysts and data enthusiasts perform data analysis for different purposes. But when it comes to presenting analyses to wider audience, the good work is not the complex one with big words. It is the one that highlights interesting relations, answers business questions or predict outcomes, and explain all that in the simplest way through data visualization or simple concepts. So if one throws numbers, model coefficients and complex graphs to impress the audience, it might fireback if the audience are not familiar with a certain concept.
Few days ago, I wanted to explore the Climate Change: Earth Surface Temperature Data dataset published on Kaggle and originally compiled by Berkeley Earth. The dataset is relatively large as it contains entries from 1750-2014!
Data visualization is a means of visual communication that should help people understand the significance of data easily and see interesting trends, patterns, distributions,..etc. If your audience fails to grasp the message that was intended to be conveyed by the graph, they are not to be blamed. You are! or to be precise, your choice of the graphical representation of the data.
Using googleVis via R provides lots of options to create nice google visualizations. I was trying to create some charts while exploring the Annual Nominal Fish Catches Data on Kaggle. I wanted to create a line motion chart and exclude the default bubble chart. So I played with the options to get the desired result. The following is a quick explanation of how to do that.
Once upon a data, there were outliers and influential observations in regression models. Using these models, we learnt that a common practice was to perform diagnostics checks to dig deeper and see how different points affect the fitted model or its coeffecients. So here we go!
Since I started to work with R, I became a frequent visitor to R-bloggers web site where I find a variety of helpful tips and tutorials. Now, as I started my own blog, it is time to give a shout-out to them!