Create New Git Repo From Shallow Clone

Jan. 27, 2021, 8:59 a.m.

How to create a new git repo from a shallow clone to not keep the history.

git clone [old repo] --depth 1

git remote add new_source [new repo]

git remote remove origin

git rev-parse --verify master >> .git/info/grafts

git filter-branch -- --all git push new_source master

Labels: git

No comments

Exploratory Data Analysis

Dec. 24, 2020, 11:34 a.m.

When I start working on a machine learning project my first impulse is always to try to fit some models. At the end of the project I always remember how important exploratory data analysis is, and wish I had remembered sooner. Even on things where EDA doesn't seem necessary it usually is.

I have been working on an instance detection challenge and what use will EDA be on a dataset of annotated images ? It turns out a lot. After doing some EDA I found that many of the annotations were wrong, and simply by correcting them I was able to greatly increase my model's performance. 

In addition, by doing some EDA on the predictions from a fitted model I was able to identify some common causes of errors and attempt to address them.

Labels: machine_learning

No comments

I've had good luck with multi-scale training for image detection so I wanted to try it for classification of images that were of different sizes with objects at differing scales. I found some base code here , but this is based on PyTorch datasets, not on ImageFolders. I wrote some code to extend it to ImageFolders, which is in the below gist :

"""Based on https://github.com/CaoWGG/multi-scale-training"""
from torch.utils.data import Sampler,RandomSampler,SequentialSampler
import numpy as np
class BatchSampler(object):
def __init__(self, sampler, batch_size, drop_last,multiscale_step=None,img_sizes = None):
if not isinstance(sampler, Sampler):
raise ValueError("sampler should be an instance of "
"torch.utils.data.Sampler, but got sampler={}"
.format(sampler))
if not isinstance(drop_last, bool):
raise ValueError("drop_last should be a boolean value, but got "
"drop_last={}".format(drop_last))
self.sampler = sampler
self.batch_size = batch_size
self.drop_last = drop_last
if multiscale_step is not None and multiscale_step < 1 :
raise ValueError("multiscale_step should be > 0, but got "
"multiscale_step={}".format(multiscale_step))
if multiscale_step is not None and img_sizes is None:
raise ValueError("img_sizes must a list, but got img_sizes={} ".format(img_sizes))
self.multiscale_step = multiscale_step
self.img_sizes = img_sizes
def __iter__(self):
num_batch = 0
batch = []
size = 416
for idx in self.sampler:
batch.append([idx,size])
if len(batch) == self.batch_size:
yield batch
num_batch+=1
batch = []
if self.multiscale_step and num_batch % self.multiscale_step == 0 :
size = np.random.choice(self.img_sizes)
if len(batch) > 0 and not self.drop_last:
yield batch
def __len__(self):
if self.drop_last:
return len(self.sampler) // self.batch_size
else:
return (len(self.sampler) + self.batch_size - 1) // self.batch_size
class MultiscaleDataSet(torchvision.datasets.ImageFolder):
"""Multiscale ImageFolder dataset"""
def __getitem__(self, index):
if isinstance(index, (tuple, list)):
index, input_size = index
else:
# set the default image size here
input_size = 448
path, target = self.samples[index]
sample = self.loader(path)
# resize the image
sample = sample.resize((input_size, input_size))
# return the image and label
return sample, target
transforms =
# create the dataset and loader
train_dataset = MultiscaleDataSet(
root="data/train",
transform=transform
)
train_loader = torch.utils.data.DataLoader(
train_dataset,
batch_sampler=BatchSampler(RandomSampler(train_dataset),
batch_size=batch_size,
multiscale_step=1,
drop_last=True,
img_sizes=[320, 384, 448, 512, 576, 640]),
num_workers=7,
)
view raw datasets.py hosted with ❤ by GitHub

Labels: machine_learning , pytorch

No comments

Azure Spot Instances

Nov. 26, 2020, 6:39 a.m.

I have some free Azure student credit so I decided to try to use some Azure VMs to train some of my models yesterday. I soon realized that a student account does not include a quota for any GPU more powerful than a K80 and with a student account there is no way to request increased quota. However, the student account does include a quota for "low priority instances" or spot instances, which are pre-emptible. So I set up a spot VM.

On AWS sometimes spot VMs can go for days before being pre-empted. Not so on Azure. I tried about a half dozen times, and no instance ever lasted long enough to complete even half an epoch, or about an hour. I was very disappointed because the spot prices were much better than AWS spot prices. For Azure spot instances you can set a price you are willing to pay, but even setting the price above the on demand price didn't make any difference.

My final complaint about Azure VMs is the shortage of images. AWS has a huge number of images for deep learning so you can basically just start the instance and you are set to go. Azure only has a few such images and they still required considerable configuration and installation of packages, which is made especially difficult by the fact that the instance kept shutting down.

I may use Azure on-demand VMs in the future, but the spot instances were largely useless.

Labels: machine_learning , azure

No comments

Archives