May 17, 2021Plotting Bollinger Bands with Plotly Graph ObjectsFor this blog, I will demonstrate how to plot Bollinger Bands using Plotly. Bollinger bands contain upper/lower bounds(±2 standard deviations) from the moving average of stock data. I will break down this tutorial blog into three steps: Obtain Data Use AlphaVantage for IBM historical data Calculate Moving Averages and Standard…Bollinger Bands3 min readBollinger Bands3 min read
May 9, 2021Using Stock Data for Classification Problem: ActionThis blog will demonstrate a simple way to frame financial stock data into a sequence classification problem. The business case is to, given historical stock data, create a model that will predict whether a trade(action) will be ‘Positive’ or ‘Negative.’ I can turn the business case into an ML problem…Stock Market5 min readStock Market5 min read
Apr 28, 2021Load Data CSV into MySQLFor this blog, I will import a CSV file to a MySQL server(using MAMP) to create a practice platform for SQL statements. More specifically, I will import the titanic dataset(train.csv), which I can find here (https://www.kaggle.com/c/titanic). This tutorial will be broken down into three steps: Installing MAMP Installing and editing configuration…Sql4 min readSql4 min read
Apr 26, 2021Chi-Squared Test for IndependencePearson’s chi-squared test for independence is used to test whether there is an association between categorical variables by seeing if there is a statistical difference between the expected counts against the observed. The test uses the aggregated counts of the categorical variables that summarize the data into a table called…Chi Square Test4 min readChi Square Test4 min read
Apr 15, 2021Summary of Agile: ScrumAgile is an approach to project management that aims always to have a working product while continuously improving in short increments. Instead of delivering a product in the end, as in the case for Waterfall, Agile looks to provide a minimum viable product(MVP) and improve on it iteratively based on…Agile6 min readAgile6 min read
Apr 9, 2021Suez Canal Blockage: Queue Backlog with Sentinel-1 SAROn March 21, 2021, a massive container ship, Ever Given, was found stuck in the Suez Canal. The Suez Canal is an important trade route as it connects a water path between Europe and Asia without going around Africa. …Suez Canal4 min readSuez Canal4 min read
Apr 5, 2021Classification: Class ImbalanceFor this blog, I will demonstrate three techniques to handle class imbalance using NYS PUMS(Public Use Microdata Sample) Census data. (You can find the dataset here.) Training classification models with imbalanced classes can lead to the model biasedly predicting the majority class. Class Imbalance Undersampling Oversampling SMOTE-NC A pseudo-objective is to classify…Classification3 min readClassification3 min read
Mar 28, 2021Pump it Up: Data Mining the Water Table — Population AnalysisFor this blog, I will run a hypothesis test if the population count around a well affects its functionality. …Tanzania4 min readTanzania4 min read
Mar 21, 2021L1, L2 Regularization in XGBoost RegressionRegularization in gradient boosted regression trees are applied to the leaf values and not the feature coefficients like in lasso/ridge regression. For this blog, I will break down the explanation into three steps: Lasso & Ridge Regression - A brief re-cap of lasso and ridge regression Gradient Boosted Regression Trees …Xgboost4 min readXgboost4 min read
Mar 15, 2021Absorbing Markov Chain: Limiting MatrixI recently came across an interesting problem that required some understanding of Absorbing Markov Chains. The objective to calculate the percentages(in the long run) of ending states given an initial state. The input is a frequency table where each state has counts of transitions based on its index. …Markov Chains3 min readMarkov Chains3 min read