Projects I've Worked On
Over the years, I've taken on numerous projects, and here are the ones I'm most proud of. Many of these are open-source, so if something catches your eye, feel free to explore the code and contribute your ideas for improvement.
Sentiment Analysis
This project involves building a sentiment analysis model using a Recurrent Neural Network (RNN) with TensorFlow and Keras. The model is designed to classify text data based on sentiment, leveraging LSTM layers for handling sequential data. Hyperparameter optimization is performed using GridSearchCV to improve model accuracy. The project showcases the integration of machine learning techniques with deep learning frameworks for natural language processing tasks.
github.com
I’m Something of a Painter Myself
I leveraged TensorFlow and Keras to implement a CycleGAN capable of transforming photos into paintings in the style of Monet. This involved creating dual generators and discriminators to facilitate bidirectional image translation and maintain realism using adversarial and cycle consistency losses. By integrating advanced techniques like GroupNormalization and employing TPU (and GPU) acceleration for training, I aimed to achieve high-quality image generation.
github.com
NLP Disaster Tweets Kaggle Project
The models in this project were designed to predict which tweets are about real disasters from the Kaggle Competition "Natural Language Processing with Disaster Tweets". The notebook is broken down into 5 main sections: Data Exploration, Preprocessing, Model Architecture, Results & Analysis, and Model Testing.
github.com
CNN Cancer Detection Kaggle Project
This project aims to identify metastatic tissue in histopathologic scans of lymph node sections from the Kaggle competition "Histopathologic Cancer Detection". The notebook is broken down into 4 main sections: Data Exploration, Model Architecture, Results & Analysis, and Model Testing.
github.com
Weather Clustering
This project involves clustering weather data to identify patterns in weather conditions. The dataset used in this project is sourced Kaggle and contains synthetic weather data from to model different locations. Although synthetic, the processing and clustering techniques can be used on real-world data.
github.com
BBC News Classification Kaggle Project
The goal of this project was to classify news articles from the BBC into one of five categories: business, entertainment, politics, sport, or tech. I conducted some EDA, preprocessed the data, implemented sklearn's NMF model, and then compared the NMF model to supervised methods. I also assessed the limitations of NMF on the Netflix movie review dataset.
github.com
NextAuth Demo
NextAuth is a project that demonstrates how to use Next.js with NextAuth for authentication. It is a simple front end that allows users to sign in with Google, Facebook, or email. The backend includes API's, middleware, and a database to store user information. This project can be run by following the instructions in the README.
github.com
Breast Cancer Prediction
This project identifies key features associated with breast cancer diagnoses and use machine learning models to predict the malignancy of breast tumors. Four models and an ensemble method are compared based on accuracy, precision, recall, and other metrics. The goal is to enhance early detection and treatment of breast cancer, thereby improving patient outcomes.
github.com
ChatPDF
ChatPDF is a project that allows users to upload a PDF and have a conversation with the text. The project parses the PDF to extract text from the file and then uses the OpenAI API to generate a conversation based on the text. It creates embeddings that are uploaded into Pinecone, while the file is hosted in a user folder within an S3 bucket on AWS.
github.com
Automodel
Automodel (Finarima on GitHub) is a project developed as my final for my Software Architecture class. It is a data application designed to return ARIMA (AutoRegressive Integrated Moving Average) models and associated plots for a given security and time period. The application follows a robust architecture incorporating external data collection APIs, data processing, testing suites, and data storage mechanisms. The application is no longer hosted but the code for both the front end and API is available on GitHub. The API was built as a separate container and hosted on ECS, and then deployed to Lambda (linked in the README).
github.com
Match Predictor
Match Predictor is a project that predicts the outcome of a match based on the teams playing. The project uses a dataset of 100,000 matches and 11 features to predict the outcome of the match. The project uses a logistic regression model to predict the outcome of the match.
github.com
Provenance
Provenance is a common architecture used for data collection. Provenance collects and stores articles from infoq.com and follows a Netflix Conductor-esque architecture. There were two main classes to work on in this project: ArticlesController and Endpoints. In the ArticlesController section, I worked on the handle function to retrieve all articles using a predefined findAll() function and then retrieve only available articles using a findAvailable() function. I then used the predefined writeJSONBody to convert articles to json. In the Endpoints section, I mapped rss results to an article collection and saved the articles to the article gateway with the execute() function. Used a predefined XmlMapper to convert RSS feeds to java objects. After completing both of those sections, I scheduled work in the App class start() function, using previously defined classes and functions. To test and build, I used the given gradle commands and JUnit to test the files.
github.com
Provenance Metrics
The goal of this project is to look at service level indicators within Provenance codebase. First, I created a Prometheus job to store the metrics data. Next, I made a data source for Prometheus using Grafana. Last, I created a Grafana dashboard to query, visualize, and alert on metrics from multiple sources — in this case, the Prometheus data source.
github.com
Email Verifier
The Email Verifier was a project that conveys how a messaging queue works. This simulates someone registering their email and getting a verification code with that email. Most of the code was prebuilt and there were 4 servers we set up to run our tests. The initial instructions are on the README to run through the set up. I built test.js using k6. I made the test last 30s and simulated 10 users. I then used res to create a POST request and used res to assert a 204 response. I was also tasked with running, testing, and improving benchmarks in benchmark.kt within the applications folder. I intentionally made a test of registrations per second that would fail to see a response. I also implemented a consistent hash exchange in the registration server.
github.com
The Milk Problem Continued
This project that explores the challenges of real-time product inventory tracking. It focuses on 'dirty reads' in high availability databases, offering an exercise that introduces transactions and event collaboration with RabbitMQ. The project provides a comprehensive guide for setting up the application environment and includes tasks and tests for practical learning. It's a valuable resource for understanding the trade-off between consistency and availability in databases.
github.com
The Milk Problem
The Milk Problem demonstrates a common error with streaming data where one client receives the same request as another client. In the literal example, multiple grocery stores order milk and all receive the same number (131). This project represent managing product inventory and highlights the use of database transactions. Before following the README instructions, I filled in TODOs. First, I implemented a decrementBy function in ProductService.kt and then moved to App.kt and replaced the update function with decrementBy to improve accuracy. After completing these, the database was created and working on the server.
github.com
Simple Aged Cache
Simple Aged Cache is an assignment in my masters program that implements and tests a simple aged cache in both Java and Kotlin. I initialized the parameters clock, size, and cache. Then I moved on to both SimpleAgedCache constructors (one initialized with a specified clock and the other with a default clock), and helper functions put(), isEmpty(), and size(). Next I worked on the ExpirableEntry class by initializing the parameters key, value, and expirationTime, initializing th constructor, and working on helper function isExpired(). getKey() and getValue() were given. The implementation for Kotlin was essentially the same process. I used gradle to build the project and JUnit to test the project.
github.com
Simple Blockchain
Simple Blockchain is an assignment in my masters program that implements a simple blockchain in Java. I revised the block class by working on the calculatedHash() function and then adding that value to the hash variable. Then I worked on the blockchain class. I first declared a linked list of blocks and then worked on the isEmpty(), add(), and size() helper functions before moving onto the isValid() function. Here is where I actually implemented the blockchain by checking for an empty chain, checking for a chain of one, then a chain of many. I iterated through the each block to compare the current and previous hash values. The supporting functions were given. I used gradle to build the project and JUnit to test the project.
github.com
Genius
Genius was my introduction to full stack web development and forced me to apply all of my previous knowledge to a real world project. Using Next.js, React, Tailwind, Prisma, Stripe, Clerk, and OpenAI, I built an e-commerce SaaS website around 5 generative AI products. It is hosted on Vercel at genius-iota-two.vercel.app.
genius.com
Unordered Graph Search
This project was for a class I took as a prerequisite for a masters program I applied to. The project was to implement an unordered graph search in C++. This is a private project since there is personally identifiable information in the code. It is available upon request.
github.com
Unordered Map
Similar to Unordered Graph Search, this project was for a class I took as a prerequisite for a masters program I applied to. The project was to implement an unordered map in C++. This is a private project since there is personally identifiable information in the code. It is available upon request.
github.com
Generic Tree
Similar to Unordered Graph Search and Unordered Map, this project was for a class I took as a prerequisite for a masters program I applied to. The project was to implement a tree in C++. This is a private project since there is personally identifiable information in the code. It is available upon request.
github.com
Image Transform
Similar to above, this project was for a class I took as a prerequisite for a masters program I applied to. The project was to transform images to grayscale and other filters in C++. This is a private project since there is personally identifiable information in the code. It is available upon request.
github.com
Text Cap App
Prompts user for text and capitalizes based off of which button is pressed and allows user to copy into clipboard.
github.com
Emoji Dictionary App
A dictionary of emojis with a rating (not comprehensive) with the count at the top. Learned how to use different views.
github.com
Fiftyville
Fiftyville provided 2 clues about a theft that took place: time and place. My task was to create queries based on the information provided, investigate those results, and create queries based on the results to find who the thief was, what city they escaped to, and who their accomplice was.
github.com
Movies
Using a database from IMDb, Movies tasked me with creating various queries for movie information. movies.db provides data about movies, stars, directors, and the ratings. I implemented 13 different queries with implementation details are located at the top of each SQL file.
github.com
Songs
songs.db is a database that stores data from Spotify about the music on their platform. It contains the top 100 streamed songs from 2018. This project required that I implement 8 different queries for various data which includes overlapping tables. The implementation details are located at the top of each SQL file.
github.com
DNA
In DNA, I coded a program that analyzed a DNA sequence, found the number of STR's, and returned who the sequence belongs to. The database that the sequence is matched to has a column of names and then columns of the number of STR's in that person's DNA. The program will match the number of each STR in that sequence to the corresponding person, which then returns who's sequence that is.
github.com
Caesar's Cipher
Caesar's Cipher is one of the oldest methods of encrypting a message. As an assignment in CS50, I coded a program in C that uses the same methodology as the cipher. It prompts a user for text, shifts the corresponding characters by value key (denoted when we run the program), and prints the now encrypted message.
github.com
Runoff
Runoff was a program designed to emulate a runoff election. I created six different functions to validate and count a vote, tabulate the votes, find the candidate with the minimum votes, eliminate those candidates with the lowest, and print the winner or print that there was a tie.
github.com
Plurality
Similar to runoff, this is another election method but with more than one candidate selection. This program prompts users for their top candidates and tallies up who has the most votes in total, not just the preferred candidate.
github.com
Scrabble
This assignment tasked me with created a program that determines the winner of a short scrabble game. The program prompts two users for words adn the highest score wins. To compute the score, I translated capital letters to lower letters and subtracted by 65 to get the correct ascii value. Then the score is summed up and a winner is assigned.
github.com
Recover
This assignment tasked me with making a program that recovers JPEGs from a forensic image. I opened the memory card, searched until the end of the card for the JPEG, read 512 bytes into a buffer, and checked if the bytes indicated the start of a JPEG. Then the program would create an output file.
github.com
Filter
This assignment had me program functions for a user to apply grayscale, sepia, reflection, or blur filters to their image. For grayscale, the program combed through each row and column, converted the pixels to float, and found the average pixel value. Similarly for sepia, I used specific formulas to get the values for sepia red, blue, and green and updated the pixel value accordingly. For the reflection, I flipped the rows to be a mirror image. To blur the image, I created a copy of the image, combed through each row and column, got the neighboring pixels, found the image value and calculated the average of neighboring pixels, then I copied those pixels into the original image.
github.com
Inheritance
Inheritance was a lab assignment that simulates genetic inheritance of blood type. Each person has two parents and two alleles. Under the create_family function, I allocated memory for a new person, set parent pointers for the current person, and assigned the current person's alleles based off the parents'. I also set up a function to set the parent pointers to null if there were no more generations to create and randomly assigned alleles. Under the free_family function, I freed the parents recursively and used free() to free the child.
github.com
Inheritance
Inheritance was a lab assignment that simulates genetic inheritance of blood type. Each person has two parents and two alleles. Under the create_family function, I allocated memory for a new person, set parent pointers for the current person, and assigned the current person's alleles based off the parents'. I also set up a function to set the parent pointers to null if there were no more generations to create and randomly assigned alleles. Under the free_family function, I freed the parents recursively and used free() to free the child.
github.com
Motion Capture
Using OpenCV and MediaPipe in Python, I was able to get the points of a human model as it moved through a 2D video. Using the points in each frame, I coded an animation text file. Using C#, I created points in Unity based off the x, y, and z coordinates for each object given in the file. That way the points created a 3D model and moved with the corresponding frames.
github.com
Movie Review Classifier
Used Sentiment Analysis to build a predictor that classifies positive and negative movie reviews. Used a dataset of 50,000 reviews provided by IMDb. Uses Python's Pickle Module, SQLite, and Flask for a web application to prompt a user for a review, make a prediction, and close the application without having to reload the dataset. Real world usage of this project could include apps for spam detection or recommendation systems. The project code was completed in 2019, but actually built in 2021.
github.com
Sentiment Analysis using Recurrent Neural Networks
I used a many-to-one architecture to construct an RNN for Sentiment Analysis. I used the same dataset from the movie review classifier project. The first half of the dataset is training data and second half is for testing. I started with building the class constructor, then added build, train, and predict methods. I instantiated the SentimentRNN class and trained the model for 40 epochs using input from the training data. Last, used the model to predict the class labels on the test set as well as returning the prediction probabilities.
github.com
Parallelizing Neural Network Training
This project aimed to introduce the TensorFlow library and convey its ability to define and train large, multilayer neural networks efficiently, as well as how the TensorFlow API can build complex ML and Neural Networking models. To start, I programmed in the low-level TF API. Next, I used TF Layers and Keras to build the multilayer neural network and learned how to build these models using the APIs.
github.com
Mechanics of TensorFlow
Covers key features and concepts of TensorFlow such as computation graphs (implementation and visualization using TensorBoard), launching a graph in a session environment, placeholders and variables, and evaluating tensors and executing operators. Touches on transforming tensors using transpose, reshape, split, and concat.
github.com
Machine Learning Classifiers
Expanding on Simple Artificial Neurons, this project expands on the algorithms for classification and provides examples, primarily using the library scikit-learn. First, I initialized the Perceptron and Adaline models and then converted them into an algorithm for logistic regression. Next, I tracked over-fitting through Regularization and then used SVM's to classify different flowers and initialize the gradient descent version of Perceptron, Logistic Regression, and SVM with a default parameter. I then used a Kernel SVM to solve linearly inseparable data and find separating hyperplanes. Next I made a decision tree (pictured above). Lastly, I implemented a K-Nearest neighbors (KNN) model using the Euclidean distance metric.
github.com
Classifying Images w/ Deep Convolutional Neural Networks
Covers CNN's and explores the base of CNN architecture. Defines convolution operation and then implements 1D and 2D implementations. Goes in depth about max- and mean-pooling. At the end, builds a deep convolutional neural network and is implemented using the TensorFlow core API along with TensorFlow Layers API to apply CNNs for image classification.
github.com
Character-Level Language Modeling with RNNs
Built an RNN model with the input as a text document. Goal is to generate new text similar to the input document. I processed data by creating a dataset to represent unique characters, made a dictionary to map each character to an integer, and another dictionary that performed reverse mapping. I converted text to a NumPy array of integers. Then reshaped into batches of sequences and created a batch_generator function. Built a CharRNN class with a build method that used one-hot encoding instead of embedding layer, a train method, and sample method (similar to predict method in Sentiment Analysis RNN). I called a CharRNN instance to train the data, then sampled and returned a text document. The original data is in old English which makes the text a little more interesting.
github.com
Ensemble Learning
This project is an introduction to combining different methods of learning algorithms, which in turn creates more accurate and reliable predictions than one single learner. I started by implementing the probability mass function and then moved onto combining classifiers by Majority Votes and implemented a Majority Vote Classifier. I used this class to make predictions based off the same Iris dataset as before and trained it on 3 different classifiers: Logistic Regression, Decision Tree Classifier, and the K-Neighbors Classifier. Next, I evaluated and tuned the Ensemble Classifier by computing ROC Curves and standardizing the training set for visual consistency of the decision tree (pictured above). Next, I conducted Bagging and built an ensemble of classifiers from bootstrap samples. Last, I used leveraged the weak learners through Adaptive Boosting.
github.com
Clustering Analysis
This project takes a look at three different clustering algorithms: K-Means Clustering, Agglomerative Hierarchical Clustering, and Density-based Spatial Clustering of Applications with Noise (DBSCAN). K-Means is an unsupervised method and clusters samples into spherical shapes based on a specific number of cluster centroids. I used two performance metrics (the elbow method and silhouette analysis) to quantify the quality of the clustering. Hierarchical clustering doesn't require specific number of clusters and is illustrated by a Dendrogram. This helps visualize the results. The DBSCAN groups points based on the densities of the samples and has the capability to handle outliers and identifying clusters with non-globular shapes.
github.com
Regression Analysis
Primarily using Scikit-Learn, this project goes into depth on modeling linear relationships between target and response variables to make predictions on a continuous scale. First, I introduced a Gradient-Descent Linear Regression and implemented an Ordinary Least Squares (OLS) Regression Model. Next, I used RANSAC as an approach to deal with the outliers in the dataset and I estimated the coefficient of the regression model. Finally, I used polynomial feature transformation and random forest regressors to model nonlinear relationships between variables.
github.com
Multilayer Artificial Neural Network
Starting with a single-layer neural network structure and connecting multiple neurons together, this project dove into the basics behind a Multilayer Artificial Neural Network and was built from scratch. This specific Multilayer Perceptron (MLP) and was designed to recognize handwritten numbers from the MNIST Database of Handwritten Digits. I classified the data and implemented backpropagation to train the MLP.
github.com
Simple Artificial Neurons
This project looks to classify flowers into two categories, 'Setosa' and 'Versicolor', based off of variables 'Sepal Length' and 'Petal Length'. I programmed two simple artificial neurons and plotted their respective decision regions. It is broken up into two sections: Perceptron and Adaline. For the perceptron, I initialized the weights and then updated the weights using a unit step function after computing the output. I used the same process for Adaline, except updated the weights using a linear activation function.
github.com
ARIMA Modeling
In my Financial Econometrics course, I learned about many different modeling methods and one of my favorite was the ARIMA model. Using my knowledge from the course and the help of the internet, I was able to model the monthly milk production in cows and plot a forecasted prediction alongside actual data. I did this by creating an ETS Plot, running an Augmented Dicky-Fuller Test, creating an Autocorrelation plot, and then making a seasonal ARIMA model.
github.com
Equity Portfolio Optimization
Using Pandas, Numpy, Scipy, and MatPlotLib, I was able to grab a combination of 12 stock tickers' chart data, set the portfolio allocation percentage, and track the performance of both the individual stocks as well as the entire portfolio. I then plotted a Sharpe Ratio and a Marcowitz Portfolio Optimization Model to visualize the risk to return.
github.com
Market Analysis
One of my earliest Python projects, this is a simple visualization of the stock performance for Ford, Tesla, and GM, as well as their volume over time. The data is stored in a CSV file for their respective stock. The dates for the CSV are from January 1st, 2017 to January 1st, 2022. It also prints out the performance from the latest close.
github.com