Shortcuts

IMU Corrector Tutorial

Uncomment this if you’re using google colab to run this script

# !pip install pypose
# !pip install pykitti

In this tutorial, we will be implementing a simple IMUCorrector using torch.nn modules and pypose.IMUPreintegrator. The functionality of our IMUCorrector is to take an input noisy IMU sensor reading, and output the corrected IMU integration result. In some way, IMUCorrector is an improved IMUPreintegrator.

We will show that, we can combine pypose.module.IMUPreintegrator into network training smoothly.

Skip the first two part if you have seen it in the imu integrator tutorial

import torch
import pykitti
import numpy as np
import pypose as pp
from torch import nn
import tqdm, argparse
from datetime import datetime
import torch.utils.data as Data
from torch.optim.lr_scheduler import ReduceLROnPlateau
import matplotlib.pyplot as plt
from matplotlib.patches import Ellipse
from matplotlib.collections import PatchCollection

1. Dataset Defination

First we will define the KITTI_IMU dataset as a data.Dataset in torch, for easy usage. We’re using the pykitti package. This package provides a minimal set of tools for working with the KITTI datasets. To access a data sequence, use:

dataset = pykitti.raw(root, dataname, drive)

Some of the data attributes we used below are:

  • dataset.timestamps: Timestamps are parsed into a list of datetime objects

  • dataset.oxts: List of OXTS packets and 6-dof poses as named tuples

For more details about the data format, please refer to their github page here.

A sequence will be seperated into many segments. The number of segments is controlled by step_size. Each segment of the sequence will return the measurements like dt, acc, and gyro for a few frames, defined by duration.

class KITTI_IMU(Data.Dataset):
    def __init__(self, root, dataname, drive, duration=10, step_size=1, mode='train'):
        super().__init__()
        self.duration = duration
        self.data = pykitti.raw(root, dataname, drive)
        self.seq_len = len(self.data.timestamps) - 1
        assert mode in ['evaluate', 'train',
                        'test'], "{} mode is not supported.".format(mode)

        self.dt = torch.tensor([datetime.timestamp(self.data.timestamps[i+1]) -
                               datetime.timestamp(self.data.timestamps[i])
                               for i in range(self.seq_len)])
        self.gyro = torch.tensor([[self.data.oxts[i].packet.wx,
                                   self.data.oxts[i].packet.wy,
                                   self.data.oxts[i].packet.wz]
                                   for i in range(self.seq_len)])
        self.acc = torch.tensor([[self.data.oxts[i].packet.ax,
                                  self.data.oxts[i].packet.ay,
                                  self.data.oxts[i].packet.az]
                                  for i in range(self.seq_len)])
        self.gt_rot = pp.euler2SO3(torch.tensor([[self.data.oxts[i].packet.roll,
                                                  self.data.oxts[i].packet.pitch,
                                                  self.data.oxts[i].packet.yaw]
                                                  for i in range(self.seq_len)]))
        self.gt_vel = self.gt_rot @ torch.tensor([[self.data.oxts[i].packet.vf,
                                                   self.data.oxts[i].packet.vl,
                                                   self.data.oxts[i].packet.vu]
                                                   for i in range(self.seq_len)])
        self.gt_pos = torch.tensor(
            np.array([self.data.oxts[i].T_w_imu[0:3, 3] for i in range(self.seq_len)]))

        start_frame = 0
        end_frame = self.seq_len
        if mode == 'train':
            end_frame = np.floor(self.seq_len * 0.5).astype(int)
        elif mode == 'test':
            start_frame = np.floor(self.seq_len * 0.5).astype(int)

        self.index_map = [i for i in range(
            0, end_frame - start_frame - self.duration, step_size)]

    def __len__(self):
        return len(self.index_map)

    def __getitem__(self, i):
        frame_id = self.index_map[i]
        end_frame_id = frame_id + self.duration
        return {
            'dt': self.dt[frame_id: end_frame_id],
            'acc': self.acc[frame_id: end_frame_id],
            'gyro': self.gyro[frame_id: end_frame_id],
            'gyro': self.gyro[frame_id: end_frame_id],
            'gt_pos': self.gt_pos[frame_id+1: end_frame_id+1],
            'gt_rot': self.gt_rot[frame_id+1: end_frame_id+1],
            'gt_vel': self.gt_vel[frame_id+1: end_frame_id+1],
            'init_pos': self.gt_pos[frame_id][None, ...],
            # TODO: the init rotation might be used in gravity compensation
            'init_rot': self.gt_rot[frame_id: end_frame_id],
            'init_vel': self.gt_vel[frame_id][None, ...],
        }

    def get_init_value(self):
        return {'pos': self.gt_pos[:1],
                'rot': self.gt_rot[:1],
                'vel': self.gt_vel[:1]}

2. Utility Functions

These are several utility functions. You can skip to the parameter definations and come back when necessary.

imu_collate

imu_collate is used in batch operation, to stack data in multiple frames together.

def imu_collate(data):
    acc = torch.stack([d['acc'] for d in data])
    gyro = torch.stack([d['gyro'] for d in data])

    gt_pos = torch.stack([d['gt_pos'] for d in data])
    gt_rot = torch.stack([d['gt_rot'] for d in data])
    gt_vel = torch.stack([d['gt_vel'] for d in data])

    init_pos = torch.stack([d['init_pos'] for d in data])
    init_rot = torch.stack([d['init_rot'] for d in data])
    init_vel = torch.stack([d['init_vel'] for d in data])

    dt = torch.stack([d['dt'] for d in data]).unsqueeze(-1)

    return {
        'dt': dt,
        'acc': acc,
        'gyro': gyro,

        'gt_pos': gt_pos,
        'gt_vel': gt_vel,
        'gt_rot': gt_rot,

        'init_pos': init_pos,
        'init_vel': init_vel,
        'init_rot': init_rot,
    }

move_to

move_to used to move different object to CUDA device.

def move_to(obj, device):
    if torch.is_tensor(obj):
        return obj.to(device)
    elif isinstance(obj, dict):
        res = {}
        for k, v in obj.items():
            res[k] = move_to(v, device)
        return res
    elif isinstance(obj, list):
        res = []
        for v in obj:
            res.append(move_to(v, device))
        return res
    else:
        raise TypeError("Invalid type for move_to", obj)

plot_gaussian

plot_gaussian used to plot an ellipse measuring uncertainty, bigger ellipse means bigger uncertainty.

def plot_gaussian(ax, means, covs, color=None, sigma=3):
    ''' Set specific color to show edges, otherwise same with facecolor.'''
    ellipses = []
    for i in range(len(means)):
        eigvals, eigvecs = np.linalg.eig(covs[i])
        axis = np.sqrt(eigvals) * sigma
        slope = eigvecs[1][0] / eigvecs[1][1]
        angle = 180.0 * np.arctan(slope) / np.pi
        ellipses.append(Ellipse(means[i, 0:2], axis[0], axis[1], angle=angle))
    ax.add_collection(PatchCollection(ellipses, edgecolors=color, linewidth=1))

3. Define IMU Corrector

Here we define the IMUCorrecter module. It has two parts, the net and the imu,
  • net is a network that resemble an autoencoder. It consists of a sequence of linear layer and activation layer. It will return the IMU measurements correction. Add this correction to the original IMU sensor data, we will get the corrected sensor reading.

  • imu is a pypose.module.IMUPreintegrator. Use the corrected sensor reading from previous step as the input to the IMUPreintegrator, we can get a more accurate IMU integration result.

class IMUCorrector(nn.Module):
    def __init__(self, size_list= [6, 64, 128, 128, 128, 6]):
        super().__init__()
        layers = []
        self.size_list = size_list
        for i in range(len(size_list) - 2):
            layers.append(nn.Linear(size_list[i], size_list[i+1]))
            layers.append(nn.GELU())
        layers.append(nn.Linear(size_list[-2], size_list[-1]))
        self.net = nn.Sequential(*layers)
        self.imu = pp.module.IMUPreintegrator(reset=True, prop_cov=False)

    def forward(self, data, init_state):
        feature = torch.cat([data["acc"], data["gyro"]], dim = -1)
        B, F = feature.shape[:2]

        output = self.net(feature.reshape(B*F,6)).reshape(B, F, 6)
        corrected_acc = output[...,:3] + data["acc"]
        corrected_gyro = output[...,3:] + data["gyro"]

        return self.imu(init_state = init_state,
                        dt = data['dt'],
                        gyro = corrected_gyro,
                        acc = corrected_acc,
                        rot = data['gt_rot'].contiguous())

4. Define the Loss Function

The loss function consists of two parts: position loss and rotation loss.

For position loss, we used torch.nn.functional.mse_loss, which is the mean squared error. See the docs for more detail.

For rotation loss, we first compute pose error between the output rotation and the ground truth rotation, then taking the norm of the lie algebra of the pose error.

Finally, we add the two loss together as our combined loss.

def get_loss(inte_state, data):
    pos_loss = torch.nn.functional.mse_loss(inte_state['pos'][:,-1,:], data['gt_pos'][:,-1,:])
    rot_loss = (data['gt_rot'][:,-1,:] * inte_state['rot'][:,-1,:].Inv()).Log().norm()

    loss = pos_loss + rot_loss
    return loss, {'pos_loss': pos_loss, 'rot_loss': rot_loss}

5. Define the Training Process

This is the training process, which has three steps:
  1. Step 1: Run forward function, to get the current network output

  2. Step 2: Collect loss, for doing backward in Step 3

  3. Step 3: Get gradients and do optimization

def train(network, train_loader, epoch, optimizer, device="cuda:0"):
    """
    Train network for one epoch using a specified data loader
    Outputs all targets, predicts, predicted covariance params, and losses in numpy arrays
    """
    network.train()
    running_loss = 0
    t_range = tqdm.tqdm(train_loader)
    for i, data in enumerate(t_range):

        # Step 1: Run forward function
        data = move_to(data, device)
        init_state = {
            "pos": data['init_pos'],
            "rot": data['init_rot'][:,:1,:],
            "vel": data['init_vel'],}
        state = network(data, init_state)

        # Step 2: Collect loss
        losses, _ = get_loss(state, data)
        running_loss += losses.item()

        # Step 3: Get gradients and do optimization
        t_range.set_description(f'iteration: {i:04d}, losses: {losses:.06f}')
        t_range.refresh()
        losses.backward()
        optimizer.step()

    return (running_loss/i)

6. Define the Testing Process

This is the testing process, which has two steps:
  1. Step 1: Run forward function, to get the current network output

  2. Step 2: Collect loss, to evaluate the network performance

def test(network, loader, device = "cuda:0"):
    network.eval()
    with torch.no_grad():
        running_loss = 0
        for i, data in enumerate(tqdm.tqdm(loader)):

            # Step 1: Run forward function
            data = move_to(data, device)
            init_state = {
            "pos": data['init_pos'],
            "rot": data['init_rot'][:,:1,:],
            "vel": data['init_vel'],}
            state = network(data, init_state)

            # Step 2: Collect loss
            losses, _ = get_loss(state, data)
            running_loss += losses.item()

        print("the running loss of the test set %0.6f"%(running_loss/i))

    return (running_loss/i)

7. Define Parameters

Here we define all the parameters we will use. See the help message for the usage of each parameter.

parser = argparse.ArgumentParser()
parser.add_argument("--device",
                    type=str,
                    default='cuda:0',
                    help="cuda or cpu")
parser.add_argument("--batch-size",
                    type=int,
                    default=4,
                    help="batch size")
parser.add_argument("--max_epoches",
                    type=int,
                    default=100,
                    help="max_epoches")
parser.add_argument("--dataroot",
                    type=str,
                    default='../dataset',
                    help="dataset location downloaded")
parser.add_argument("--dataname",
                    type=str,
                    default='2011_09_26',
                    help="dataset name")
parser.add_argument("--datadrive",
                    nargs='+',
                    type=str,
                    default=[ "0001"],
                    help="data sequences")
parser.add_argument('--load_ckpt',
                    default=False,
                    action="store_true")
args, unknown = parser.parse_known_args(); print(args)
Namespace(device='cuda:0', batch_size=4, max_epoches=100, dataroot='../dataset', dataname='2011_09_26', datadrive=['0001'], load_ckpt=False)

8. Define Dataloaders

train_dataset = KITTI_IMU(args.dataroot, args.dataname, args.datadrive[0],
                          duration=10, mode='train')
test_dataset = KITTI_IMU(args.dataroot, args.dataname, args.datadrive[0],
                         duration=10, mode='test')
train_loader = Data.DataLoader(dataset=train_dataset, batch_size=args.batch_size,
                               collate_fn=imu_collate, shuffle=True)
test_loader = Data.DataLoader(dataset=test_dataset, batch_size=args.batch_size,
                              collate_fn=imu_collate, shuffle=False)

9. Main Training Loop

Here we will run our main training loop. First, like in pytorch, we will define the network, optimizer and scheduler.

If you are not familiar with the process of training a network, we would recommand you reading one of the PyTorch tutorial, like this.

For each epoch, we run both the training and testing once and collect the running loss. We can see from the output message below: the running losss is reducing, which means our IMUCorrecter is working.

network = IMUCorrector().to(args.device)
optimizer = torch.optim.Adam(network.parameters(), lr = 5e-6)  # to use with ViTs
scheduler = ReduceLROnPlateau(optimizer, 'min', factor = 0.1, patience = 10) # default setup

for epoch_i in range(args.max_epoches):
    train_loss = train(network, train_loader, epoch_i, optimizer, device = args.device)
    test_loss = test(network, test_loader, device = args.device)
    scheduler.step(train_loss)
    print("train loss: %f test loss: %f "%(train_loss, test_loss))
  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.299206:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.299206:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.300126:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.300126:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.300553:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.300553:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.293976:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.293976:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.293690:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.293690:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.288384:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.288384:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.292710:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.292710:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.294848:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.294848:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.294848:  73%|#######2  | 8/11 [00:00<00:00, 74.22it/s]
iteration: 0008, losses: 0.290991:  73%|#######2  | 8/11 [00:00<00:00, 74.22it/s]
iteration: 0008, losses: 0.290991:  73%|#######2  | 8/11 [00:00<00:00, 74.22it/s]
iteration: 0009, losses: 0.287749:  73%|#######2  | 8/11 [00:00<00:00, 74.22it/s]
iteration: 0009, losses: 0.287749:  73%|#######2  | 8/11 [00:00<00:00, 74.22it/s]
iteration: 0010, losses: 0.248864:  73%|#######2  | 8/11 [00:00<00:00, 74.22it/s]
iteration: 0010, losses: 0.248864:  73%|#######2  | 8/11 [00:00<00:00, 74.22it/s]
iteration: 0010, losses: 0.248864: 100%|##########| 11/11 [00:00<00:00, 85.65it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 151.93it/s]
the running loss of the test set 0.313447
train loss: 0.319110 test loss: 0.313447

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.285986:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.285986:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.282041:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.282041:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.286413:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.286413:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.280192:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.280192:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.278516:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.278516:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.275917:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.275917:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.273201:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.273201:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.274058:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.274058:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.272992:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.272992:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.272992:  82%|########1 | 9/11 [00:00<00:00, 85.81it/s]
iteration: 0009, losses: 0.273937:  82%|########1 | 9/11 [00:00<00:00, 85.81it/s]
iteration: 0009, losses: 0.273937:  82%|########1 | 9/11 [00:00<00:00, 85.81it/s]
iteration: 0010, losses: 0.239127:  82%|########1 | 9/11 [00:00<00:00, 85.81it/s]
iteration: 0010, losses: 0.239127:  82%|########1 | 9/11 [00:00<00:00, 85.81it/s]
iteration: 0010, losses: 0.239127: 100%|##########| 11/11 [00:00<00:00, 75.88it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 211.28it/s]
the running loss of the test set 0.295152
train loss: 0.302238 test loss: 0.295152

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.270017:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.270017:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.266822:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.266822:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.265930:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.265930:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.262890:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.262890:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.261928:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.261928:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.261753:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.261753:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.256637:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.256637:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.262146:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.262146:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.252307:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.252307:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.253908:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.253908:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.253908:  91%|######### | 10/11 [00:00<00:00, 99.49it/s]
iteration: 0010, losses: 0.218251:  91%|######### | 10/11 [00:00<00:00, 99.49it/s]
iteration: 0010, losses: 0.218251:  91%|######### | 10/11 [00:00<00:00, 99.49it/s]
iteration: 0010, losses: 0.218251: 100%|##########| 11/11 [00:00<00:00, 102.81it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 253.77it/s]
the running loss of the test set 0.275216
train loss: 0.283259 test loss: 0.275216

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.248931:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.248931:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.246487:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.246487:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.244090:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.244090:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.244767:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.244767:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.246740:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.246740:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.243117:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.243117:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.241229:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.241229:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.237942:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.237942:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.235601:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.235601:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.235601:  82%|########1 | 9/11 [00:00<00:00, 78.78it/s]
iteration: 0009, losses: 0.232927:  82%|########1 | 9/11 [00:00<00:00, 78.78it/s]
iteration: 0009, losses: 0.232927:  82%|########1 | 9/11 [00:00<00:00, 78.78it/s]
iteration: 0010, losses: 0.211547:  82%|########1 | 9/11 [00:00<00:00, 78.78it/s]
iteration: 0010, losses: 0.211547:  82%|########1 | 9/11 [00:00<00:00, 78.78it/s]
iteration: 0010, losses: 0.211547: 100%|##########| 11/11 [00:00<00:00, 83.44it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 241.48it/s]
the running loss of the test set 0.254224
train loss: 0.263338 test loss: 0.254224

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.236652:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.236652:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.228599:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.228599:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.228709:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.228709:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.222794:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.222794:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.226223:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.226223:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.222300:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.222300:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.217385:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.217385:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.218240:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.218240:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.214193:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.214193:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.218399:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.218399:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.187245:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.187245:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.187245: 100%|##########| 11/11 [00:00<00:00, 103.71it/s]
iteration: 0010, losses: 0.187245: 100%|##########| 11/11 [00:00<00:00, 103.56it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 177.40it/s]
the running loss of the test set 0.232710
train loss: 0.242074 test loss: 0.232710

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.212669:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.212669:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.215685:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.215685:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.207338:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.207338:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.202692:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.202692:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.200854:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.200854:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.199938:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.199938:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.201804:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.201804:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.204362:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.204362:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.198220:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.198220:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.198220:  82%|########1 | 9/11 [00:00<00:00, 83.89it/s]
iteration: 0009, losses: 0.195698:  82%|########1 | 9/11 [00:00<00:00, 83.89it/s]
iteration: 0009, losses: 0.195698:  82%|########1 | 9/11 [00:00<00:00, 83.89it/s]
iteration: 0010, losses: 0.167064:  82%|########1 | 9/11 [00:00<00:00, 83.89it/s]
iteration: 0010, losses: 0.167064:  82%|########1 | 9/11 [00:00<00:00, 83.89it/s]
iteration: 0010, losses: 0.167064: 100%|##########| 11/11 [00:00<00:00, 90.52it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 186.87it/s]
the running loss of the test set 0.211048
train loss: 0.220632 test loss: 0.211048

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.192203:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.192203:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.191510:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.191510:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.187954:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.187954:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.184028:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.184028:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.190685:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.190685:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.184154:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.184154:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.185778:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.185778:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.178413:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.178413:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.178413:  73%|#######2  | 8/11 [00:00<00:00, 78.53it/s]
iteration: 0008, losses: 0.176472:  73%|#######2  | 8/11 [00:00<00:00, 78.53it/s]
iteration: 0008, losses: 0.176472:  73%|#######2  | 8/11 [00:00<00:00, 78.53it/s]
iteration: 0009, losses: 0.173393:  73%|#######2  | 8/11 [00:00<00:00, 78.53it/s]
iteration: 0009, losses: 0.173393:  73%|#######2  | 8/11 [00:00<00:00, 78.53it/s]
iteration: 0010, losses: 0.147828:  73%|#######2  | 8/11 [00:00<00:00, 78.53it/s]
iteration: 0010, losses: 0.147828:  73%|#######2  | 8/11 [00:00<00:00, 78.53it/s]
iteration: 0010, losses: 0.147828: 100%|##########| 11/11 [00:00<00:00, 88.67it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 180.11it/s]
the running loss of the test set 0.189462
train loss: 0.199242 test loss: 0.189462

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.173565:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.173565:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.169454:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.169454:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.165186:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.165186:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.159335:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.159335:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.168375:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.168375:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.162640:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.162640:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.162640:  55%|#####4    | 6/11 [00:00<00:00, 58.85it/s]
iteration: 0006, losses: 0.162674:  55%|#####4    | 6/11 [00:00<00:00, 58.85it/s]
iteration: 0006, losses: 0.162674:  55%|#####4    | 6/11 [00:00<00:00, 58.85it/s]
iteration: 0007, losses: 0.165029:  55%|#####4    | 6/11 [00:00<00:00, 58.85it/s]
iteration: 0007, losses: 0.165029:  55%|#####4    | 6/11 [00:00<00:00, 58.85it/s]
iteration: 0008, losses: 0.163277:  55%|#####4    | 6/11 [00:00<00:00, 58.85it/s]
iteration: 0008, losses: 0.163277:  55%|#####4    | 6/11 [00:00<00:00, 58.85it/s]
iteration: 0009, losses: 0.151499:  55%|#####4    | 6/11 [00:00<00:00, 58.85it/s]
iteration: 0009, losses: 0.151499:  55%|#####4    | 6/11 [00:00<00:00, 58.85it/s]
iteration: 0010, losses: 0.141346:  55%|#####4    | 6/11 [00:00<00:00, 58.85it/s]
iteration: 0010, losses: 0.141346:  55%|#####4    | 6/11 [00:00<00:00, 58.85it/s]
iteration: 0010, losses: 0.141346: 100%|##########| 11/11 [00:00<00:00, 78.56it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 256.57it/s]
the running loss of the test set 0.168015
train loss: 0.178238 test loss: 0.168015

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.148611:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.148611:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.152096:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.152096:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.156074:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.156074:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.146314:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.146314:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.140796:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.140796:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.143451:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.143451:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.144611:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.144611:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.136184:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.136184:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.140919:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.140919:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.140919:  82%|########1 | 9/11 [00:00<00:00, 83.87it/s]
iteration: 0009, losses: 0.137465:  82%|########1 | 9/11 [00:00<00:00, 83.87it/s]
iteration: 0009, losses: 0.137465:  82%|########1 | 9/11 [00:00<00:00, 83.87it/s]
iteration: 0010, losses: 0.125587:  82%|########1 | 9/11 [00:00<00:00, 83.87it/s]
iteration: 0010, losses: 0.125587:  82%|########1 | 9/11 [00:00<00:00, 83.87it/s]
iteration: 0010, losses: 0.125587: 100%|##########| 11/11 [00:00<00:00, 78.92it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 361.50it/s]
the running loss of the test set 0.146862
train loss: 0.157211 test loss: 0.146862

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.130732:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.130732:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.130403:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.130403:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.127034:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.127034:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.134334:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.134334:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.128720:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.128720:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.126910:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.126910:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.118896:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.118896:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.120953:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.120953:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.120953:  73%|#######2  | 8/11 [00:00<00:00, 79.09it/s]
iteration: 0008, losses: 0.126049:  73%|#######2  | 8/11 [00:00<00:00, 79.09it/s]
iteration: 0008, losses: 0.126049:  73%|#######2  | 8/11 [00:00<00:00, 79.09it/s]
iteration: 0009, losses: 0.116705:  73%|#######2  | 8/11 [00:00<00:00, 79.09it/s]
iteration: 0009, losses: 0.116705:  73%|#######2  | 8/11 [00:00<00:00, 79.09it/s]
iteration: 0010, losses: 0.103275:  73%|#######2  | 8/11 [00:00<00:00, 79.09it/s]
iteration: 0010, losses: 0.103275:  73%|#######2  | 8/11 [00:00<00:00, 79.09it/s]
iteration: 0010, losses: 0.103275: 100%|##########| 11/11 [00:00<00:00, 91.32it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 182.18it/s]
the running loss of the test set 0.126329
train loss: 0.136401 test loss: 0.126329

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.112587:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.112587:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.112408:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.112408:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.113911:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.113911:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.107583:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.107583:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.117529:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.117529:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.109132:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.109132:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.098946:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.098946:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.097891:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.097891:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.104212:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.104212:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.104212:  82%|########1 | 9/11 [00:00<00:00, 84.13it/s]
iteration: 0009, losses: 0.104201:  82%|########1 | 9/11 [00:00<00:00, 84.13it/s]
iteration: 0009, losses: 0.104201:  82%|########1 | 9/11 [00:00<00:00, 84.13it/s]
iteration: 0010, losses: 0.087202:  82%|########1 | 9/11 [00:00<00:00, 84.13it/s]
iteration: 0010, losses: 0.087202:  82%|########1 | 9/11 [00:00<00:00, 84.13it/s]
iteration: 0010, losses: 0.087202: 100%|##########| 11/11 [00:00<00:00, 91.81it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 183.63it/s]
the running loss of the test set 0.106817
train loss: 0.116560 test loss: 0.106817

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.102293:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.102293:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.090598:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.090598:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.092002:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.092002:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.089443:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.089443:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.095486:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.095486:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.089027:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.089027:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.088844:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.088844:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.085642:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.085642:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.086503:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.086503:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.082710:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.082710:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.082710:  91%|######### | 10/11 [00:00<00:00, 85.78it/s]
iteration: 0010, losses: 0.078762:  91%|######### | 10/11 [00:00<00:00, 85.78it/s]
iteration: 0010, losses: 0.078762:  91%|######### | 10/11 [00:00<00:00, 85.78it/s]
iteration: 0010, losses: 0.078762: 100%|##########| 11/11 [00:00<00:00, 88.15it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 183.03it/s]
the running loss of the test set 0.088984
train loss: 0.098131 test loss: 0.088984

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.076650:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.076650:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.078412:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.078412:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.079756:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.079756:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.079526:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.079526:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.081404:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.081404:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.069438:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.069438:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.076541:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.076541:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.071877:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.071877:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.070603:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.070603:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.070603:  82%|########1 | 9/11 [00:00<00:00, 83.82it/s]
iteration: 0009, losses: 0.070901:  82%|########1 | 9/11 [00:00<00:00, 83.82it/s]
iteration: 0009, losses: 0.070901:  82%|########1 | 9/11 [00:00<00:00, 83.82it/s]
iteration: 0010, losses: 0.059825:  82%|########1 | 9/11 [00:00<00:00, 83.82it/s]
iteration: 0010, losses: 0.059825:  82%|########1 | 9/11 [00:00<00:00, 83.82it/s]
iteration: 0010, losses: 0.059825: 100%|##########| 11/11 [00:00<00:00, 88.91it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 361.45it/s]
the running loss of the test set 0.074079
train loss: 0.081493 test loss: 0.074079

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.068454:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.068454:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.070304:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.070304:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.063605:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.063605:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.066925:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.066925:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.062110:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.062110:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.069830:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.069830:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.065209:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.065209:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.051501:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.051501:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.061460:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.061460:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.061460:  82%|########1 | 9/11 [00:00<00:00, 88.47it/s]
iteration: 0009, losses: 0.061530:  82%|########1 | 9/11 [00:00<00:00, 88.47it/s]
iteration: 0009, losses: 0.061530:  82%|########1 | 9/11 [00:00<00:00, 88.47it/s]
iteration: 0010, losses: 0.047427:  82%|########1 | 9/11 [00:00<00:00, 88.47it/s]
iteration: 0010, losses: 0.047427:  82%|########1 | 9/11 [00:00<00:00, 88.47it/s]
iteration: 0010, losses: 0.047427: 100%|##########| 11/11 [00:00<00:00, 79.23it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 288.88it/s]
the running loss of the test set 0.064345
train loss: 0.068835 test loss: 0.064345

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.066862:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.066862:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.061448:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.061448:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.050301:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.050301:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.065330:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.065330:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.061371:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.061371:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.051886:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.051886:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.062064:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.062064:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.057578:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.057578:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.047592:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.047592:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.047592:  82%|########1 | 9/11 [00:00<00:00, 87.37it/s]
iteration: 0009, losses: 0.048645:  82%|########1 | 9/11 [00:00<00:00, 87.37it/s]
iteration: 0009, losses: 0.048645:  82%|########1 | 9/11 [00:00<00:00, 87.37it/s]
iteration: 0010, losses: 0.052881:  82%|########1 | 9/11 [00:00<00:00, 87.37it/s]
iteration: 0010, losses: 0.052881:  82%|########1 | 9/11 [00:00<00:00, 87.37it/s]
iteration: 0010, losses: 0.052881: 100%|##########| 11/11 [00:00<00:00, 93.04it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 169.26it/s]
the running loss of the test set 0.062480
train loss: 0.062596 test loss: 0.062480

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.046240:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.046240:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.054048:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.054048:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.059302:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.059302:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.060702:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.060702:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.063208:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.063208:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.063234:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.063234:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.055930:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.055930:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.056020:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.056020:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.060961:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.060961:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.062120:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.062120:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.066409:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.066409:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.066409: 100%|##########| 11/11 [00:00<00:00, 91.10it/s]
iteration: 0010, losses: 0.066409: 100%|##########| 11/11 [00:00<00:00, 90.87it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 331.40it/s]
the running loss of the test set 0.068396
train loss: 0.064817 test loss: 0.068396

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.061323:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.061323:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.062432:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.062432:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.054127:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.054127:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.064007:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.064007:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.076985:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.076985:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.066494:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.066494:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.069142:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.069142:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.069142:  64%|######3   | 7/11 [00:00<00:00, 62.26it/s]
iteration: 0007, losses: 0.061364:  64%|######3   | 7/11 [00:00<00:00, 62.26it/s]
iteration: 0007, losses: 0.061364:  64%|######3   | 7/11 [00:00<00:00, 62.26it/s]
iteration: 0008, losses: 0.069430:  64%|######3   | 7/11 [00:00<00:00, 62.26it/s]
iteration: 0008, losses: 0.069430:  64%|######3   | 7/11 [00:00<00:00, 62.26it/s]
iteration: 0009, losses: 0.075199:  64%|######3   | 7/11 [00:00<00:00, 62.26it/s]
iteration: 0009, losses: 0.075199:  64%|######3   | 7/11 [00:00<00:00, 62.26it/s]
iteration: 0010, losses: 0.063954:  64%|######3   | 7/11 [00:00<00:00, 62.26it/s]
iteration: 0010, losses: 0.063954:  64%|######3   | 7/11 [00:00<00:00, 62.26it/s]
iteration: 0010, losses: 0.063954: 100%|##########| 11/11 [00:00<00:00, 78.38it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 179.70it/s]
the running loss of the test set 0.078923
train loss: 0.072446 test loss: 0.078923

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.076294:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.076294:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.074770:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.074770:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.076390:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.076390:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.079386:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.079386:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.073033:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.073033:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.073972:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.073972:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.071823:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.071823:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.077141:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.077141:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.078082:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.078082:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.078082:  82%|########1 | 9/11 [00:00<00:00, 85.78it/s]
iteration: 0009, losses: 0.082783:  82%|########1 | 9/11 [00:00<00:00, 85.78it/s]
iteration: 0009, losses: 0.082783:  82%|########1 | 9/11 [00:00<00:00, 85.78it/s]
iteration: 0010, losses: 0.076334:  82%|########1 | 9/11 [00:00<00:00, 85.78it/s]
iteration: 0010, losses: 0.076334:  82%|########1 | 9/11 [00:00<00:00, 85.78it/s]
iteration: 0010, losses: 0.076334: 100%|##########| 11/11 [00:00<00:00, 80.27it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 350.25it/s]
the running loss of the test set 0.091431
train loss: 0.084001 test loss: 0.091431

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.078563:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.078563:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.085481:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.085481:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.085929:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.085929:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.082398:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.082398:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.079654:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.079654:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.085445:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.085445:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.094175:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.094175:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.094175:  64%|######3   | 7/11 [00:00<00:00, 65.45it/s]
iteration: 0007, losses: 0.095211:  64%|######3   | 7/11 [00:00<00:00, 65.45it/s]
iteration: 0007, losses: 0.095211:  64%|######3   | 7/11 [00:00<00:00, 65.45it/s]
iteration: 0008, losses: 0.092269:  64%|######3   | 7/11 [00:00<00:00, 65.45it/s]
iteration: 0008, losses: 0.092269:  64%|######3   | 7/11 [00:00<00:00, 65.45it/s]
iteration: 0009, losses: 0.094152:  64%|######3   | 7/11 [00:00<00:00, 65.45it/s]
iteration: 0009, losses: 0.094152:  64%|######3   | 7/11 [00:00<00:00, 65.45it/s]
iteration: 0010, losses: 0.092603:  64%|######3   | 7/11 [00:00<00:00, 65.45it/s]
iteration: 0010, losses: 0.092603:  64%|######3   | 7/11 [00:00<00:00, 65.45it/s]
iteration: 0010, losses: 0.092603: 100%|##########| 11/11 [00:00<00:00, 65.90it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 367.93it/s]
the running loss of the test set 0.104514
train loss: 0.096588 test loss: 0.104514

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.096337:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.096337:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.096965:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.096965:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.096236:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.096236:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.094995:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.094995:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.098197:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.098197:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.103396:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.103396:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.107509:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.107509:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.102930:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.102930:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.102930:  73%|#######2  | 8/11 [00:00<00:00, 79.23it/s]
iteration: 0008, losses: 0.103733:  73%|#######2  | 8/11 [00:00<00:00, 79.23it/s]
iteration: 0008, losses: 0.103733:  73%|#######2  | 8/11 [00:00<00:00, 79.23it/s]
iteration: 0009, losses: 0.101060:  73%|#######2  | 8/11 [00:00<00:00, 79.23it/s]
iteration: 0009, losses: 0.101060:  73%|#######2  | 8/11 [00:00<00:00, 79.23it/s]
iteration: 0010, losses: 0.092422:  73%|#######2  | 8/11 [00:00<00:00, 79.23it/s]
iteration: 0010, losses: 0.092422:  73%|#######2  | 8/11 [00:00<00:00, 79.23it/s]
iteration: 0010, losses: 0.092422: 100%|##########| 11/11 [00:00<00:00, 91.14it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 177.14it/s]
the running loss of the test set 0.117363
train loss: 0.109378 test loss: 0.117363

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.105416:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.105416:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.107201:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.107201:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.112628:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.112628:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.111003:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.111003:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.105827:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.105827:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.109192:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.109192:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.115090:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.115090:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.112302:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.112302:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.111139:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.111139:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.111139:  82%|########1 | 9/11 [00:00<00:00, 83.89it/s]
iteration: 0009, losses: 0.116685:  82%|########1 | 9/11 [00:00<00:00, 83.89it/s]
iteration: 0009, losses: 0.116685:  82%|########1 | 9/11 [00:00<00:00, 83.89it/s]
iteration: 0010, losses: 0.111876:  82%|########1 | 9/11 [00:00<00:00, 83.89it/s]
iteration: 0010, losses: 0.111876:  82%|########1 | 9/11 [00:00<00:00, 83.89it/s]
iteration: 0010, losses: 0.111876: 100%|##########| 11/11 [00:00<00:00, 91.02it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 190.89it/s]
the running loss of the test set 0.129476
train loss: 0.121836 test loss: 0.129476

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.116557:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.116557:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.120670:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.120670:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.116449:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.116449:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.122299:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.122299:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.123480:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.123480:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.118714:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.118714:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.123100:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.123100:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.123865:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.123865:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.126611:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.126611:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.124369:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.124369:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.116838:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.116838:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.116838: 100%|##########| 11/11 [00:00<00:00, 106.88it/s]
iteration: 0010, losses: 0.116838: 100%|##########| 11/11 [00:00<00:00, 106.73it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 190.22it/s]
the running loss of the test set 0.140495
train loss: 0.133295 test loss: 0.140495

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.126652:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.126652:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.130636:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.130636:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.131283:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.131283:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.128645:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.128645:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.137103:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.137103:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.129446:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.129446:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.134166:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.134166:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.131614:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.131614:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.130953:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.130953:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.138736:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.138736:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.138736:  91%|######### | 10/11 [00:00<00:00, 99.26it/s]
iteration: 0010, losses: 0.115208:  91%|######### | 10/11 [00:00<00:00, 99.26it/s]
iteration: 0010, losses: 0.115208:  91%|######### | 10/11 [00:00<00:00, 99.26it/s]
iteration: 0010, losses: 0.115208: 100%|##########| 11/11 [00:00<00:00, 90.88it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 365.81it/s]
the running loss of the test set 0.150146
train loss: 0.143444 test loss: 0.150146

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.135979:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.135979:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.133076:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.133076:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.143011:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.143011:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.138150:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.138150:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.135708:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.135708:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.142212:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.142212:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.144436:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.144436:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.140820:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.140820:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.140820:  73%|#######2  | 8/11 [00:00<00:00, 70.75it/s]
iteration: 0008, losses: 0.139386:  73%|#######2  | 8/11 [00:00<00:00, 70.75it/s]
iteration: 0008, losses: 0.139386:  73%|#######2  | 8/11 [00:00<00:00, 70.75it/s]
iteration: 0009, losses: 0.144428:  73%|#######2  | 8/11 [00:00<00:00, 70.75it/s]
iteration: 0009, losses: 0.144428:  73%|#######2  | 8/11 [00:00<00:00, 70.75it/s]
iteration: 0010, losses: 0.126220:  73%|#######2  | 8/11 [00:00<00:00, 70.75it/s]
iteration: 0010, losses: 0.126220:  73%|#######2  | 8/11 [00:00<00:00, 70.75it/s]
iteration: 0010, losses: 0.126220: 100%|##########| 11/11 [00:00<00:00, 79.75it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 185.07it/s]
the running loss of the test set 0.158207
train loss: 0.152343 test loss: 0.158207

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.140088:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.140088:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.147604:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.147604:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.143452:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.143452:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.143363:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.143363:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.145350:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.145350:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.144224:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.144224:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.145488:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.145488:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.145736:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.145736:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.152367:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.152367:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.153763:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.153763:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.134793:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.134793:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.134793: 100%|##########| 11/11 [00:00<00:00, 91.10it/s]
iteration: 0010, losses: 0.134793: 100%|##########| 11/11 [00:00<00:00, 90.98it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 254.76it/s]
the running loss of the test set 0.164489
train loss: 0.159623 test loss: 0.164489

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.150128:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.150128:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.146342:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.146342:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.150295:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.150295:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.155500:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.155500:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.150649:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.150649:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.149775:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.149775:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.152384:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.152384:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.153298:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.153298:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.153298:  73%|#######2  | 8/11 [00:00<00:00, 79.81it/s]
iteration: 0008, losses: 0.154586:  73%|#######2  | 8/11 [00:00<00:00, 79.81it/s]
iteration: 0008, losses: 0.154586:  73%|#######2  | 8/11 [00:00<00:00, 79.81it/s]
iteration: 0009, losses: 0.150456:  73%|#######2  | 8/11 [00:00<00:00, 79.81it/s]
iteration: 0009, losses: 0.150456:  73%|#######2  | 8/11 [00:00<00:00, 79.81it/s]
iteration: 0010, losses: 0.135508:  73%|#######2  | 8/11 [00:00<00:00, 79.81it/s]
iteration: 0010, losses: 0.135508:  73%|#######2  | 8/11 [00:00<00:00, 79.81it/s]
iteration: 0010, losses: 0.135508: 100%|##########| 11/11 [00:00<00:00, 89.12it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 238.45it/s]
the running loss of the test set 0.168834
train loss: 0.164892 test loss: 0.168834

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.156171:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.156171:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.154033:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.154033:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.156807:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.156807:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.152150:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.152150:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.153578:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.153578:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.152711:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.152711:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.153379:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.153379:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.153377:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.153377:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.155579:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.155579:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.155579:  82%|########1 | 9/11 [00:00<00:00, 69.70it/s]
iteration: 0009, losses: 0.151356:  82%|########1 | 9/11 [00:00<00:00, 69.70it/s]
iteration: 0009, losses: 0.151356:  82%|########1 | 9/11 [00:00<00:00, 69.70it/s]
iteration: 0010, losses: 0.132382:  82%|########1 | 9/11 [00:00<00:00, 69.70it/s]
iteration: 0010, losses: 0.132382:  82%|########1 | 9/11 [00:00<00:00, 69.70it/s]
iteration: 0010, losses: 0.132382: 100%|##########| 11/11 [00:00<00:00, 74.65it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 348.59it/s]
the running loss of the test set 0.169061
train loss: 0.167152 test loss: 0.169061

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.156114:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.156114:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.150304:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.150304:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.155756:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.155756:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.153195:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.153195:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.151565:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.151565:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.153062:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.153062:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.156645:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.156645:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.149183:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.149183:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.158133:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.158133:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.158133:  82%|########1 | 9/11 [00:00<00:00, 86.02it/s]
iteration: 0009, losses: 0.151356:  82%|########1 | 9/11 [00:00<00:00, 86.02it/s]
iteration: 0009, losses: 0.151356:  82%|########1 | 9/11 [00:00<00:00, 86.02it/s]
iteration: 0010, losses: 0.139301:  82%|########1 | 9/11 [00:00<00:00, 86.02it/s]
iteration: 0010, losses: 0.139301:  82%|########1 | 9/11 [00:00<00:00, 86.02it/s]
iteration: 0010, losses: 0.139301: 100%|##########| 11/11 [00:00<00:00, 92.88it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 191.72it/s]
the running loss of the test set 0.169075
train loss: 0.167462 test loss: 0.169075

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.153920:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.153920:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.155429:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.155429:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.152408:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.152408:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.158795:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.158795:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.155310:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.155310:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.152439:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.152439:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.150898:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.150898:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.154012:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.154012:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.153418:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.153418:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.153418:  82%|########1 | 9/11 [00:00<00:00, 84.11it/s]
iteration: 0009, losses: 0.154649:  82%|########1 | 9/11 [00:00<00:00, 84.11it/s]
iteration: 0009, losses: 0.154649:  82%|########1 | 9/11 [00:00<00:00, 84.11it/s]
iteration: 0010, losses: 0.130191:  82%|########1 | 9/11 [00:00<00:00, 84.11it/s]
iteration: 0010, losses: 0.130191:  82%|########1 | 9/11 [00:00<00:00, 84.11it/s]
iteration: 0010, losses: 0.130191: 100%|##########| 11/11 [00:00<00:00, 90.17it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 186.73it/s]
the running loss of the test set 0.168873
train loss: 0.167147 test loss: 0.168873

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.152105:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.152105:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.157480:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.157480:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.156845:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.156845:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.153218:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.153218:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.157811:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.157811:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.148738:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.148738:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.153560:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.153560:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.149901:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.149901:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.153833:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.153833:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.151830:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.151830:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.151830:  91%|######### | 10/11 [00:00<00:00, 88.09it/s]
iteration: 0010, losses: 0.134336:  91%|######### | 10/11 [00:00<00:00, 88.09it/s]
iteration: 0010, losses: 0.134336:  91%|######### | 10/11 [00:00<00:00, 88.09it/s]
iteration: 0010, losses: 0.134336: 100%|##########| 11/11 [00:00<00:00, 91.58it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 243.55it/s]
the running loss of the test set 0.168456
train loss: 0.166966 test loss: 0.168456

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.153396:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.153396:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.150904:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.150904:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.151907:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.151907:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.157253:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.157253:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.159047:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.159047:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.151941:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.151941:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.151354:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.151354:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.150719:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.150719:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.150353:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.150353:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.150353:  82%|########1 | 9/11 [00:00<00:00, 86.01it/s]
iteration: 0009, losses: 0.151921:  82%|########1 | 9/11 [00:00<00:00, 86.01it/s]
iteration: 0009, losses: 0.151921:  82%|########1 | 9/11 [00:00<00:00, 86.01it/s]
iteration: 0010, losses: 0.135562:  82%|########1 | 9/11 [00:00<00:00, 86.01it/s]
iteration: 0010, losses: 0.135562:  82%|########1 | 9/11 [00:00<00:00, 86.01it/s]
iteration: 0010, losses: 0.135562: 100%|##########| 11/11 [00:00<00:00, 82.28it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 244.14it/s]
the running loss of the test set 0.167827
train loss: 0.166436 test loss: 0.167827

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.151329:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.151329:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.155242:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.155242:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.153846:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.153846:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.148325:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.148325:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.154395:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.154395:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.153641:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.153641:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.151978:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.151978:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.147988:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.147988:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.150528:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.150528:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.151626:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.151626:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.151626:  91%|######### | 10/11 [00:00<00:00, 99.92it/s]
iteration: 0010, losses: 0.139228:  91%|######### | 10/11 [00:00<00:00, 99.92it/s]
iteration: 0010, losses: 0.139228:  91%|######### | 10/11 [00:00<00:00, 99.92it/s]
iteration: 0010, losses: 0.139228: 100%|##########| 11/11 [00:00<00:00, 90.89it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 178.55it/s]
the running loss of the test set 0.166991
train loss: 0.165813 test loss: 0.166991

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.150114:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.150114:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.148636:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.148636:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.150640:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.150640:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.150756:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.150756:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.153690:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.153690:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.150230:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.150230:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.151352:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.151352:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.152860:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.152860:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.153802:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.153802:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.154445:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.154445:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.131607:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.131607:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.131607: 100%|##########| 11/11 [00:00<00:00, 103.96it/s]
iteration: 0010, losses: 0.131607: 100%|##########| 11/11 [00:00<00:00, 103.81it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 183.89it/s]
the running loss of the test set 0.165956
train loss: 0.164813 test loss: 0.165956

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.152993:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.152993:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.151913:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.151913:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.150977:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.150977:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.156742:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.156742:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.147761:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.147761:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.146633:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.146633:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.151765:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.151765:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.146716:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.146716:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.149525:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.149525:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.151221:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.151221:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.130850:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.130850:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.130850: 100%|##########| 11/11 [00:00<00:00, 104.83it/s]
iteration: 0010, losses: 0.130850: 100%|##########| 11/11 [00:00<00:00, 104.63it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 251.43it/s]
the running loss of the test set 0.164734
train loss: 0.163709 test loss: 0.164734

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.152149:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.152149:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.147092:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.147092:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.148405:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.148405:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.152592:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.152592:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.148223:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.148223:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.151483:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.151483:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.148808:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.148808:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.148334:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.148334:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.147869:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.147869:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.147869:  82%|########1 | 9/11 [00:00<00:00, 85.69it/s]
iteration: 0009, losses: 0.145979:  82%|########1 | 9/11 [00:00<00:00, 85.69it/s]
iteration: 0009, losses: 0.145979:  82%|########1 | 9/11 [00:00<00:00, 85.69it/s]
iteration: 0010, losses: 0.134784:  82%|########1 | 9/11 [00:00<00:00, 85.69it/s]
iteration: 0010, losses: 0.134784:  82%|########1 | 9/11 [00:00<00:00, 85.69it/s]
iteration: 0010, losses: 0.134784: 100%|##########| 11/11 [00:00<00:00, 92.84it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 184.61it/s]
the running loss of the test set 0.163337
train loss: 0.162572 test loss: 0.163337

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.151284:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.151284:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.149129:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.149129:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.156874:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.156874:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.144536:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.144536:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.144972:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.144972:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.148982:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.148982:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.146560:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.146560:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.148403:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.148403:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.148403:  73%|#######2  | 8/11 [00:00<00:00, 76.60it/s]
iteration: 0008, losses: 0.145420:  73%|#######2  | 8/11 [00:00<00:00, 76.60it/s]
iteration: 0008, losses: 0.145420:  73%|#######2  | 8/11 [00:00<00:00, 76.60it/s]
iteration: 0009, losses: 0.145588:  73%|#######2  | 8/11 [00:00<00:00, 76.60it/s]
iteration: 0009, losses: 0.145588:  73%|#######2  | 8/11 [00:00<00:00, 76.60it/s]
iteration: 0010, losses: 0.127750:  73%|#######2  | 8/11 [00:00<00:00, 76.60it/s]
iteration: 0010, losses: 0.127750:  73%|#######2  | 8/11 [00:00<00:00, 76.60it/s]
iteration: 0010, losses: 0.127750: 100%|##########| 11/11 [00:00<00:00, 79.19it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 233.91it/s]
the running loss of the test set 0.161780
train loss: 0.160950 test loss: 0.161780

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.145351:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.145351:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.152932:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.152932:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.148416:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.148416:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.150428:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.150428:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.146949:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.146949:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.148915:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.148915:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.142098:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.142098:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.143890:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.143890:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.144158:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.144158:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.143796:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.143796:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.126251:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.126251:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.126251: 100%|##########| 11/11 [00:00<00:00, 105.90it/s]
iteration: 0010, losses: 0.126251: 100%|##########| 11/11 [00:00<00:00, 105.66it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 186.01it/s]
the running loss of the test set 0.160078
train loss: 0.159319 test loss: 0.160078

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.148096:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.148096:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.146372:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.146372:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.147710:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.147710:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.142027:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.142027:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.147475:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.147475:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.145814:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.145814:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.143676:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.143676:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.145261:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.145261:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.145261:  73%|#######2  | 8/11 [00:00<00:00, 79.29it/s]
iteration: 0008, losses: 0.146123:  73%|#######2  | 8/11 [00:00<00:00, 79.29it/s]
iteration: 0008, losses: 0.146123:  73%|#######2  | 8/11 [00:00<00:00, 79.29it/s]
iteration: 0009, losses: 0.145551:  73%|#######2  | 8/11 [00:00<00:00, 79.29it/s]
iteration: 0009, losses: 0.145551:  73%|#######2  | 8/11 [00:00<00:00, 79.29it/s]
iteration: 0010, losses: 0.124971:  73%|#######2  | 8/11 [00:00<00:00, 79.29it/s]
iteration: 0010, losses: 0.124971:  73%|#######2  | 8/11 [00:00<00:00, 79.29it/s]
iteration: 0010, losses: 0.124971: 100%|##########| 11/11 [00:00<00:00, 81.50it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 364.66it/s]
the running loss of the test set 0.159894
train loss: 0.158308 test loss: 0.159894

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.149356:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.149356:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.142194:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.142194:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.150209:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.150209:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.145195:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.145195:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.142505:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.142505:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.146016:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.146016:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.142433:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.142433:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.142658:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.142658:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.144366:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.144366:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.144366:  82%|########1 | 9/11 [00:00<00:00, 83.98it/s]
iteration: 0009, losses: 0.148367:  82%|########1 | 9/11 [00:00<00:00, 83.98it/s]
iteration: 0009, losses: 0.148367:  82%|########1 | 9/11 [00:00<00:00, 83.98it/s]
iteration: 0010, losses: 0.128413:  82%|########1 | 9/11 [00:00<00:00, 83.98it/s]
iteration: 0010, losses: 0.128413:  82%|########1 | 9/11 [00:00<00:00, 83.98it/s]
iteration: 0010, losses: 0.128413: 100%|##########| 11/11 [00:00<00:00, 90.11it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 184.24it/s]
the running loss of the test set 0.159699
train loss: 0.158171 test loss: 0.159699

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.144571:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.144571:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.142122:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.142122:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.144863:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.144863:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.144839:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.144839:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.142969:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.142969:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.146011:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.146011:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.149970:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.149970:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.148549:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.148549:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.141513:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.141513:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.141513:  82%|########1 | 9/11 [00:00<00:00, 87.40it/s]
iteration: 0009, losses: 0.147090:  82%|########1 | 9/11 [00:00<00:00, 87.40it/s]
iteration: 0009, losses: 0.147090:  82%|########1 | 9/11 [00:00<00:00, 87.40it/s]
iteration: 0010, losses: 0.127453:  82%|########1 | 9/11 [00:00<00:00, 87.40it/s]
iteration: 0010, losses: 0.127453:  82%|########1 | 9/11 [00:00<00:00, 87.40it/s]
iteration: 0010, losses: 0.127453: 100%|##########| 11/11 [00:00<00:00, 93.25it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 361.29it/s]
the running loss of the test set 0.159493
train loss: 0.157995 test loss: 0.159493

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.144717:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.144717:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.144641:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.144641:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.144201:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.144201:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.145869:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.145869:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.145609:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.145609:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.140880:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.140880:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.148887:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.148887:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.144380:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.144380:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.144380:  73%|#######2  | 8/11 [00:00<00:00, 72.06it/s]
iteration: 0008, losses: 0.146893:  73%|#######2  | 8/11 [00:00<00:00, 72.06it/s]
iteration: 0008, losses: 0.146893:  73%|#######2  | 8/11 [00:00<00:00, 72.06it/s]
iteration: 0009, losses: 0.147090:  73%|#######2  | 8/11 [00:00<00:00, 72.06it/s]
iteration: 0009, losses: 0.147090:  73%|#######2  | 8/11 [00:00<00:00, 72.06it/s]
iteration: 0010, losses: 0.123721:  73%|#######2  | 8/11 [00:00<00:00, 72.06it/s]
iteration: 0010, losses: 0.123721:  73%|#######2  | 8/11 [00:00<00:00, 72.06it/s]
iteration: 0010, losses: 0.123721: 100%|##########| 11/11 [00:00<00:00, 82.83it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 184.56it/s]
the running loss of the test set 0.159277
train loss: 0.157689 test loss: 0.159277

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.147223:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.147223:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.141434:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.141434:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.145635:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.145635:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.141822:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.141822:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.146172:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.146172:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.140953:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.140953:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.146074:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.146074:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.147929:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.147929:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.147657:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.147657:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.142199:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.142199:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.128812:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.128812:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.128812: 100%|##########| 11/11 [00:00<00:00, 94.52it/s]
iteration: 0010, losses: 0.128812: 100%|##########| 11/11 [00:00<00:00, 94.34it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 241.17it/s]
the running loss of the test set 0.159053
train loss: 0.157591 test loss: 0.159053

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.146868:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.146868:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.146053:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.146053:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.142472:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.142472:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.141248:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.141248:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.142822:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.142822:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.142209:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.142209:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.144675:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.144675:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.143876:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.143876:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.146293:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.146293:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.145098:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.145098:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.132456:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.132456:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.132456: 100%|##########| 11/11 [00:00<00:00, 107.88it/s]
iteration: 0010, losses: 0.132456: 100%|##########| 11/11 [00:00<00:00, 107.71it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 178.73it/s]
the running loss of the test set 0.158822
train loss: 0.157407 test loss: 0.158822

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.143955:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.143955:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.143095:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.143095:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.144462:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.144462:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.148772:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.148772:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.142935:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.142935:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.143608:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.143608:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.143890:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.143890:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.142942:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.142942:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.142942:  73%|#######2  | 8/11 [00:00<00:00, 78.72it/s]
iteration: 0008, losses: 0.145602:  73%|#######2  | 8/11 [00:00<00:00, 78.72it/s]
iteration: 0008, losses: 0.145602:  73%|#######2  | 8/11 [00:00<00:00, 78.72it/s]
iteration: 0009, losses: 0.144574:  73%|#######2  | 8/11 [00:00<00:00, 78.72it/s]
iteration: 0009, losses: 0.144574:  73%|#######2  | 8/11 [00:00<00:00, 78.72it/s]
iteration: 0010, losses: 0.127424:  73%|#######2  | 8/11 [00:00<00:00, 78.72it/s]
iteration: 0010, losses: 0.127424:  73%|#######2  | 8/11 [00:00<00:00, 78.72it/s]
iteration: 0010, losses: 0.127424: 100%|##########| 11/11 [00:00<00:00, 89.35it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 252.36it/s]
the running loss of the test set 0.158585
train loss: 0.157126 test loss: 0.158585

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.142366:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.142366:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.139528:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.139528:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.145957:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.145957:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.141366:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.141366:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.149768:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.149768:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.142049:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.142049:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.146685:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.146685:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.143344:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.143344:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.147672:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.147672:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.144595:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.144595:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.144595:  91%|######### | 10/11 [00:00<00:00, 78.24it/s]
iteration: 0010, losses: 0.124965:  91%|######### | 10/11 [00:00<00:00, 78.24it/s]
iteration: 0010, losses: 0.124965:  91%|######### | 10/11 [00:00<00:00, 78.24it/s]
iteration: 0010, losses: 0.124965: 100%|##########| 11/11 [00:00<00:00, 78.03it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 208.86it/s]
the running loss of the test set 0.158342
train loss: 0.156830 test loss: 0.158342

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.141712:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.141712:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.145025:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.145025:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.142295:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.142295:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.144204:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.144204:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.141404:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.141404:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.143934:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.143934:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.145742:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.145742:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.143422:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.143422:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.146308:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.146308:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.145457:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.145457:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.145457:  91%|######### | 10/11 [00:00<00:00, 86.26it/s]
iteration: 0010, losses: 0.126249:  91%|######### | 10/11 [00:00<00:00, 86.26it/s]
iteration: 0010, losses: 0.126249:  91%|######### | 10/11 [00:00<00:00, 86.26it/s]
iteration: 0010, losses: 0.126249: 100%|##########| 11/11 [00:00<00:00, 78.31it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 373.08it/s]
the running loss of the test set 0.158094
train loss: 0.156575 test loss: 0.158094

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.141913:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.141913:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.144524:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.144524:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.141508:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.141508:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.141705:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.141705:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.148190:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.148190:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.144646:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.144646:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.144164:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.144164:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.144164:  64%|######3   | 7/11 [00:00<00:00, 58.37it/s]
iteration: 0007, losses: 0.143023:  64%|######3   | 7/11 [00:00<00:00, 58.37it/s]
iteration: 0007, losses: 0.143023:  64%|######3   | 7/11 [00:00<00:00, 58.37it/s]
iteration: 0008, losses: 0.141938:  64%|######3   | 7/11 [00:00<00:00, 58.37it/s]
iteration: 0008, losses: 0.141938:  64%|######3   | 7/11 [00:00<00:00, 58.37it/s]
iteration: 0009, losses: 0.145025:  64%|######3   | 7/11 [00:00<00:00, 58.37it/s]
iteration: 0009, losses: 0.145025:  64%|######3   | 7/11 [00:00<00:00, 58.37it/s]
iteration: 0010, losses: 0.127473:  64%|######3   | 7/11 [00:00<00:00, 58.37it/s]
iteration: 0010, losses: 0.127473:  64%|######3   | 7/11 [00:00<00:00, 58.37it/s]
iteration: 0010, losses: 0.127473: 100%|##########| 11/11 [00:00<00:00, 67.25it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 250.01it/s]
the running loss of the test set 0.157843
train loss: 0.156411 test loss: 0.157843

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.147378:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.147378:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.145639:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.145639:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.144745:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.144745:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.139797:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.139797:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.140621:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.140621:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.141734:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.141734:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.142557:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.142557:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.141992:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.141992:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.143174:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.143174:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.142192:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.142192:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.142192:  91%|######### | 10/11 [00:00<00:00, 88.31it/s]
iteration: 0010, losses: 0.132174:  91%|######### | 10/11 [00:00<00:00, 88.31it/s]
iteration: 0010, losses: 0.132174:  91%|######### | 10/11 [00:00<00:00, 88.31it/s]
iteration: 0010, losses: 0.132174: 100%|##########| 11/11 [00:00<00:00, 83.55it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 229.55it/s]
the running loss of the test set 0.157588
train loss: 0.156200 test loss: 0.157588

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.143383:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.143383:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.146103:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.146103:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.143289:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.143289:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.147881:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.147881:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.143895:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.143895:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.141415:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.141415:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.140268:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.140268:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.143860:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.143860:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.141267:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.141267:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.141337:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.141337:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.127480:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.127480:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.127480: 100%|##########| 11/11 [00:00<00:00, 105.30it/s]
iteration: 0010, losses: 0.127480: 100%|##########| 11/11 [00:00<00:00, 105.15it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 246.61it/s]
the running loss of the test set 0.157562
train loss: 0.156018 test loss: 0.157562

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.147237:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.147237:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.141396:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.141396:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.141676:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.141676:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.142824:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.142824:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.140577:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.140577:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.142683:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.142683:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.139639:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.139639:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.140248:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.140248:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.147079:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.147079:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.149413:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.149413:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.149413:  91%|######### | 10/11 [00:00<00:00, 91.77it/s]
iteration: 0010, losses: 0.127138:  91%|######### | 10/11 [00:00<00:00, 91.77it/s]
iteration: 0010, losses: 0.127138:  91%|######### | 10/11 [00:00<00:00, 91.77it/s]
iteration: 0010, losses: 0.127138: 100%|##########| 11/11 [00:00<00:00, 83.82it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 370.92it/s]
the running loss of the test set 0.157536
train loss: 0.155991 test loss: 0.157536

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.145318:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.145318:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.140145:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.140145:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.142236:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.142236:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.144938:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.144938:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.140586:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.140586:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.141173:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.141173:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.144383:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.144383:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.144293:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.144293:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.143258:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.143258:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.143258:  82%|########1 | 9/11 [00:00<00:00, 84.28it/s]
iteration: 0009, losses: 0.145977:  82%|########1 | 9/11 [00:00<00:00, 84.28it/s]
iteration: 0009, losses: 0.145977:  82%|########1 | 9/11 [00:00<00:00, 84.28it/s]
iteration: 0010, losses: 0.127566:  82%|########1 | 9/11 [00:00<00:00, 84.28it/s]
iteration: 0010, losses: 0.127566:  82%|########1 | 9/11 [00:00<00:00, 84.28it/s]
iteration: 0010, losses: 0.127566: 100%|##########| 11/11 [00:00<00:00, 81.77it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 245.97it/s]
the running loss of the test set 0.157510
train loss: 0.155987 test loss: 0.157510

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.144768:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.144768:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.141913:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.141913:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.144035:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.144035:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.145529:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.145529:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.143102:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.143102:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.145844:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.145844:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.142021:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.142021:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.143140:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.143140:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.142881:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.142881:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.142559:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.142559:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.142559:  91%|######### | 10/11 [00:00<00:00, 94.31it/s]
iteration: 0010, losses: 0.122551:  91%|######### | 10/11 [00:00<00:00, 94.31it/s]
iteration: 0010, losses: 0.122551:  91%|######### | 10/11 [00:00<00:00, 94.31it/s]
iteration: 0010, losses: 0.122551: 100%|##########| 11/11 [00:00<00:00, 96.34it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 246.17it/s]
the running loss of the test set 0.157483
train loss: 0.155834 test loss: 0.157483

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.143024:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.143024:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.138273:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.138273:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.145756:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.145756:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.142786:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.142786:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.143961:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.143961:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.142979:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.142979:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.141483:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.141483:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.142995:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.142995:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.148012:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.148012:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.148012:  82%|########1 | 9/11 [00:00<00:00, 86.14it/s]
iteration: 0009, losses: 0.145626:  82%|########1 | 9/11 [00:00<00:00, 86.14it/s]
iteration: 0009, losses: 0.145626:  82%|########1 | 9/11 [00:00<00:00, 86.14it/s]
iteration: 0010, losses: 0.123234:  82%|########1 | 9/11 [00:00<00:00, 86.14it/s]
iteration: 0010, losses: 0.123234:  82%|########1 | 9/11 [00:00<00:00, 86.14it/s]
iteration: 0010, losses: 0.123234: 100%|##########| 11/11 [00:00<00:00, 83.17it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 248.55it/s]
the running loss of the test set 0.157457
train loss: 0.155813 test loss: 0.157457

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.139115:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.139115:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.144880:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.144880:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.144192:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.144192:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.141833:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.141833:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.145795:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.145795:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.143777:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.143777:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.143108:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.143108:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.140864:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.140864:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.142754:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.142754:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.148567:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.148567:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.148567:  91%|######### | 10/11 [00:00<00:00, 97.67it/s]
iteration: 0010, losses: 0.123071:  91%|######### | 10/11 [00:00<00:00, 97.67it/s]
iteration: 0010, losses: 0.123071:  91%|######### | 10/11 [00:00<00:00, 97.67it/s]
iteration: 0010, losses: 0.123071: 100%|##########| 11/11 [00:00<00:00, 100.86it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 165.95it/s]
the running loss of the test set 0.157430
train loss: 0.155795 test loss: 0.157430

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.142137:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.142137:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.142284:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.142284:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.142452:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.142452:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.140749:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.140749:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.141132:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.141132:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.143820:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.143820:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.148505:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.148505:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.143308:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.143308:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.141712:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.141712:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.142344:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.142344:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.142344:  91%|######### | 10/11 [00:00<00:00, 87.64it/s]
iteration: 0010, losses: 0.130901:  91%|######### | 10/11 [00:00<00:00, 87.64it/s]
iteration: 0010, losses: 0.130901:  91%|######### | 10/11 [00:00<00:00, 87.64it/s]
iteration: 0010, losses: 0.130901: 100%|##########| 11/11 [00:00<00:00, 91.09it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 185.07it/s]
the running loss of the test set 0.157404
train loss: 0.155934 test loss: 0.157404

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.144270:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.144270:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.140812:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.140812:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.149060:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.149060:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.143140:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.143140:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.143202:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.143202:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.141218:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.141218:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.141691:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.141691:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.141274:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.141274:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.141418:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.141418:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.141418:  82%|########1 | 9/11 [00:00<00:00, 83.18it/s]
iteration: 0009, losses: 0.144175:  82%|########1 | 9/11 [00:00<00:00, 83.18it/s]
iteration: 0009, losses: 0.144175:  82%|########1 | 9/11 [00:00<00:00, 83.18it/s]
iteration: 0010, losses: 0.127889:  82%|########1 | 9/11 [00:00<00:00, 83.18it/s]
iteration: 0010, losses: 0.127889:  82%|########1 | 9/11 [00:00<00:00, 83.18it/s]
iteration: 0010, losses: 0.127889: 100%|##########| 11/11 [00:00<00:00, 88.74it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 185.32it/s]
the running loss of the test set 0.157377
train loss: 0.155815 test loss: 0.157377

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.141350:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.141350:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.139173:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.139173:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.143447:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.143447:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.140739:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.140739:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.150064:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.150064:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.147746:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.147746:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.139140:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.139140:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.141641:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.141641:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.140457:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.140457:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.140457:  82%|########1 | 9/11 [00:00<00:00, 83.05it/s]
iteration: 0009, losses: 0.146541:  82%|########1 | 9/11 [00:00<00:00, 83.05it/s]
iteration: 0009, losses: 0.146541:  82%|########1 | 9/11 [00:00<00:00, 83.05it/s]
iteration: 0010, losses: 0.127911:  82%|########1 | 9/11 [00:00<00:00, 83.05it/s]
iteration: 0010, losses: 0.127911:  82%|########1 | 9/11 [00:00<00:00, 83.05it/s]
iteration: 0010, losses: 0.127911: 100%|##########| 11/11 [00:00<00:00, 89.81it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 359.12it/s]
the running loss of the test set 0.157350
train loss: 0.155821 test loss: 0.157350

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.143115:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.143115:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.139788:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.139788:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.150001:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.150001:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.139560:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.139560:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.142855:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.142855:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.141719:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.141719:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.143424:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.143424:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.143881:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.143881:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.142051:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.142051:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.142706:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.142706:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.142706:  91%|######### | 10/11 [00:00<00:00, 76.65it/s]
iteration: 0010, losses: 0.128959:  91%|######### | 10/11 [00:00<00:00, 76.65it/s]
iteration: 0010, losses: 0.128959:  91%|######### | 10/11 [00:00<00:00, 76.65it/s]
iteration: 0010, losses: 0.128959: 100%|##########| 11/11 [00:00<00:00, 77.02it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 177.95it/s]
the running loss of the test set 0.157323
train loss: 0.155806 test loss: 0.157323

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.142669:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.142669:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.142222:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.142222:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.141254:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.141254:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.141123:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.141123:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.143970:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.143970:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.143510:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.143510:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.144865:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.144865:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.146358:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.146358:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.144943:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.144943:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.139720:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.139720:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.139720:  91%|######### | 10/11 [00:00<00:00, 99.93it/s]
iteration: 0010, losses: 0.126320:  91%|######### | 10/11 [00:00<00:00, 99.93it/s]
iteration: 0010, losses: 0.126320:  91%|######### | 10/11 [00:00<00:00, 99.93it/s]
iteration: 0010, losses: 0.126320: 100%|##########| 11/11 [00:00<00:00, 92.05it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 232.63it/s]
the running loss of the test set 0.157296
train loss: 0.155695 test loss: 0.157296

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.145058:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.145058:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.147270:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.147270:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.143798:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.143798:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.142620:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.142620:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.141819:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.141819:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.141464:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.141464:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.140385:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.140385:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.142403:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.142403:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.140494:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.140494:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.144493:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.144493:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.128171:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.128171:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.128171: 100%|##########| 11/11 [00:00<00:00, 91.02it/s]
iteration: 0010, losses: 0.128171: 100%|##########| 11/11 [00:00<00:00, 90.90it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 193.80it/s]
the running loss of the test set 0.157269
train loss: 0.155797 test loss: 0.157269

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.141361:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.141361:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.147306:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.147306:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.145605:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.145605:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.138326:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.138326:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.142569:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.142569:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.143345:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.143345:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.141496:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.141496:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.147284:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.147284:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.141042:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.141042:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.141042:  82%|########1 | 9/11 [00:00<00:00, 87.66it/s]
iteration: 0009, losses: 0.142751:  82%|########1 | 9/11 [00:00<00:00, 87.66it/s]
iteration: 0009, losses: 0.142751:  82%|########1 | 9/11 [00:00<00:00, 87.66it/s]
iteration: 0010, losses: 0.125298:  82%|########1 | 9/11 [00:00<00:00, 87.66it/s]
iteration: 0010, losses: 0.125298:  82%|########1 | 9/11 [00:00<00:00, 87.66it/s]
iteration: 0010, losses: 0.125298: 100%|##########| 11/11 [00:00<00:00, 93.10it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 352.22it/s]
the running loss of the test set 0.157243
train loss: 0.155638 test loss: 0.157243

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.142946:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.142946:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.147429:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.147429:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.142000:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.142000:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.145320:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.145320:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.141452:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.141452:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.138408:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.138408:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.141306:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.141306:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.141306:  64%|######3   | 7/11 [00:00<00:00, 62.74it/s]
iteration: 0007, losses: 0.140816:  64%|######3   | 7/11 [00:00<00:00, 62.74it/s]
iteration: 0007, losses: 0.140816:  64%|######3   | 7/11 [00:00<00:00, 62.74it/s]
iteration: 0008, losses: 0.141681:  64%|######3   | 7/11 [00:00<00:00, 62.74it/s]
iteration: 0008, losses: 0.141681:  64%|######3   | 7/11 [00:00<00:00, 62.74it/s]
iteration: 0009, losses: 0.146418:  64%|######3   | 7/11 [00:00<00:00, 62.74it/s]
iteration: 0009, losses: 0.146418:  64%|######3   | 7/11 [00:00<00:00, 62.74it/s]
iteration: 0010, losses: 0.129712:  64%|######3   | 7/11 [00:00<00:00, 62.74it/s]
iteration: 0010, losses: 0.129712:  64%|######3   | 7/11 [00:00<00:00, 62.74it/s]
iteration: 0010, losses: 0.129712: 100%|##########| 11/11 [00:00<00:00, 72.79it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 353.36it/s]
the running loss of the test set 0.157216
train loss: 0.155749 test loss: 0.157216

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.140113:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.140113:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.145518:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.145518:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.143116:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.143116:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.141128:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.141128:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.144563:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.144563:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.142566:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.142566:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.147989:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.147989:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.140076:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.140076:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.140256:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.140256:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.140256:  82%|########1 | 9/11 [00:00<00:00, 84.21it/s]
iteration: 0009, losses: 0.142286:  82%|########1 | 9/11 [00:00<00:00, 84.21it/s]
iteration: 0009, losses: 0.142286:  82%|########1 | 9/11 [00:00<00:00, 84.21it/s]
iteration: 0010, losses: 0.129418:  82%|########1 | 9/11 [00:00<00:00, 84.21it/s]
iteration: 0010, losses: 0.129418:  82%|########1 | 9/11 [00:00<00:00, 84.21it/s]
iteration: 0010, losses: 0.129418: 100%|##########| 11/11 [00:00<00:00, 82.52it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 241.99it/s]
the running loss of the test set 0.157189
train loss: 0.155703 test loss: 0.157189

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.142074:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.142074:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.144425:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.144425:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.138665:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.138665:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.142285:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.142285:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.140164:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.140164:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.141125:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.141125:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.142856:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.142856:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.150260:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.150260:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.143149:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.143149:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.146083:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.146083:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.124486:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.124486:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.124486: 100%|##########| 11/11 [00:00<00:00, 104.68it/s]
iteration: 0010, losses: 0.124486: 100%|##########| 11/11 [00:00<00:00, 104.53it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 189.52it/s]
the running loss of the test set 0.157162
train loss: 0.155557 test loss: 0.157162

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.142932:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.142932:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.144964:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.144964:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.142248:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.142248:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.140342:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.140342:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.146435:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.146435:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.140027:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.140027:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.146719:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.146719:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.140839:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.140839:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.142103:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.142103:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.139608:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.139608:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.130473:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.130473:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.130473: 100%|##########| 11/11 [00:00<00:00, 121.45it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 149.17it/s]
the running loss of the test set 0.157136
train loss: 0.155669 test loss: 0.157136

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.143422:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.143422:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.141156:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.141156:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.145044:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.145044:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.139998:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.139998:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.143978:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.143978:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.141353:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.141353:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.139604:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.139604:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.148425:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.148425:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.148575:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.148575:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.140928:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.140928:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.140928:  91%|######### | 10/11 [00:00<00:00, 98.04it/s]
iteration: 0010, losses: 0.121693:  91%|######### | 10/11 [00:00<00:00, 98.04it/s]
iteration: 0010, losses: 0.121693:  91%|######### | 10/11 [00:00<00:00, 98.04it/s]
iteration: 0010, losses: 0.121693: 100%|##########| 11/11 [00:00<00:00, 101.64it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 184.77it/s]
the running loss of the test set 0.157109
train loss: 0.155418 test loss: 0.157109

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.142237:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.142237:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.145051:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.145051:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.145279:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.145279:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.142286:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.142286:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.140780:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.140780:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.141817:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.141817:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.141281:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.141281:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.145442:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.145442:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.145442:  73%|#######2  | 8/11 [00:00<00:00, 78.89it/s]
iteration: 0008, losses: 0.141567:  73%|#######2  | 8/11 [00:00<00:00, 78.89it/s]
iteration: 0008, losses: 0.141567:  73%|#######2  | 8/11 [00:00<00:00, 78.89it/s]
iteration: 0009, losses: 0.140439:  73%|#######2  | 8/11 [00:00<00:00, 78.89it/s]
iteration: 0009, losses: 0.140439:  73%|#######2  | 8/11 [00:00<00:00, 78.89it/s]
iteration: 0010, losses: 0.129301:  73%|#######2  | 8/11 [00:00<00:00, 78.89it/s]
iteration: 0010, losses: 0.129301:  73%|#######2  | 8/11 [00:00<00:00, 78.89it/s]
iteration: 0010, losses: 0.129301: 100%|##########| 11/11 [00:00<00:00, 73.29it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 241.86it/s]
the running loss of the test set 0.157082
train loss: 0.155548 test loss: 0.157082

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.136976:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.136976:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.144851:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.144851:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.140751:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.140751:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.142886:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.142886:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.144760:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.144760:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.145802:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.145802:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.145235:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.145235:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.145147:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.145147:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.144385:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.144385:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.140317:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.140317:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.122873:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.122873:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.122873: 100%|##########| 11/11 [00:00<00:00, 104.82it/s]
iteration: 0010, losses: 0.122873: 100%|##########| 11/11 [00:00<00:00, 104.52it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 184.75it/s]
the running loss of the test set 0.157056
train loss: 0.155398 test loss: 0.157056

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.140042:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.140042:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.140554:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.140554:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.145038:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.145038:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.141641:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.141641:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.143500:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.143500:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.144193:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.144193:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.148539:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.148539:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.144504:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.144504:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.144504:  73%|#######2  | 8/11 [00:00<00:00, 79.16it/s]
iteration: 0008, losses: 0.140508:  73%|#######2  | 8/11 [00:00<00:00, 79.16it/s]
iteration: 0008, losses: 0.140508:  73%|#######2  | 8/11 [00:00<00:00, 79.16it/s]
iteration: 0009, losses: 0.141966:  73%|#######2  | 8/11 [00:00<00:00, 79.16it/s]
iteration: 0009, losses: 0.141966:  73%|#######2  | 8/11 [00:00<00:00, 79.16it/s]
iteration: 0010, losses: 0.123264:  73%|#######2  | 8/11 [00:00<00:00, 79.16it/s]
iteration: 0010, losses: 0.123264:  73%|#######2  | 8/11 [00:00<00:00, 79.16it/s]
iteration: 0010, losses: 0.123264: 100%|##########| 11/11 [00:00<00:00, 90.42it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 250.57it/s]
the running loss of the test set 0.157029
train loss: 0.155375 test loss: 0.157029

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.144306:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.144306:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.145673:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.145673:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.140846:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.140846:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.141009:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.141009:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.141089:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.141089:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.144033:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.144033:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.142672:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.142672:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.145077:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.145077:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.146369:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.146369:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.138955:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.138955:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.123892:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.123892:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.123892: 100%|##########| 11/11 [00:00<00:00, 105.40it/s]
iteration: 0010, losses: 0.123892: 100%|##########| 11/11 [00:00<00:00, 105.25it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 187.83it/s]
the running loss of the test set 0.157003
train loss: 0.155392 test loss: 0.157003

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.140735:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.140735:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.151336:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.151336:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.139057:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.139057:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.142421:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.142421:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.142600:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.142600:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.143840:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.143840:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.143638:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.143638:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.142886:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.142886:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.140475:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.140475:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.140475:  82%|########1 | 9/11 [00:00<00:00, 85.09it/s]
iteration: 0009, losses: 0.139953:  82%|########1 | 9/11 [00:00<00:00, 85.09it/s]
iteration: 0009, losses: 0.139953:  82%|########1 | 9/11 [00:00<00:00, 85.09it/s]
iteration: 0010, losses: 0.127809:  82%|########1 | 9/11 [00:00<00:00, 85.09it/s]
iteration: 0010, losses: 0.127809:  82%|########1 | 9/11 [00:00<00:00, 85.09it/s]
iteration: 0010, losses: 0.127809: 100%|##########| 11/11 [00:00<00:00, 91.53it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 341.29it/s]
the running loss of the test set 0.156976
train loss: 0.155475 test loss: 0.156976

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.142148:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.142148:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.141929:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.141929:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.145284:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.145284:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.144251:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.144251:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.141307:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.141307:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.144695:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.144695:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.142861:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.142861:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.142861:  64%|######3   | 7/11 [00:00<00:00, 68.34it/s]
iteration: 0007, losses: 0.140356:  64%|######3   | 7/11 [00:00<00:00, 68.34it/s]
iteration: 0007, losses: 0.140356:  64%|######3   | 7/11 [00:00<00:00, 68.34it/s]
iteration: 0008, losses: 0.143422:  64%|######3   | 7/11 [00:00<00:00, 68.34it/s]
iteration: 0008, losses: 0.143422:  64%|######3   | 7/11 [00:00<00:00, 68.34it/s]
iteration: 0009, losses: 0.141693:  64%|######3   | 7/11 [00:00<00:00, 68.34it/s]
iteration: 0009, losses: 0.141693:  64%|######3   | 7/11 [00:00<00:00, 68.34it/s]
iteration: 0010, losses: 0.125957:  64%|######3   | 7/11 [00:00<00:00, 68.34it/s]
iteration: 0010, losses: 0.125957:  64%|######3   | 7/11 [00:00<00:00, 68.34it/s]
iteration: 0010, losses: 0.125957: 100%|##########| 11/11 [00:00<00:00, 67.17it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 181.94it/s]
the running loss of the test set 0.156950
train loss: 0.155390 test loss: 0.156950

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.143832:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.143832:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.139270:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.139270:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.142078:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.142078:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.145323:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.145323:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.143540:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.143540:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.146326:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.146326:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.140024:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.140024:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.142107:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.142107:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.141638:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.141638:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.141638:  82%|########1 | 9/11 [00:00<00:00, 84.80it/s]
iteration: 0009, losses: 0.142537:  82%|########1 | 9/11 [00:00<00:00, 84.80it/s]
iteration: 0009, losses: 0.142537:  82%|########1 | 9/11 [00:00<00:00, 84.80it/s]
iteration: 0010, losses: 0.127282:  82%|########1 | 9/11 [00:00<00:00, 84.80it/s]
iteration: 0010, losses: 0.127282:  82%|########1 | 9/11 [00:00<00:00, 84.80it/s]
iteration: 0010, losses: 0.127282: 100%|##########| 11/11 [00:00<00:00, 91.82it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 187.74it/s]
the running loss of the test set 0.156924
train loss: 0.155396 test loss: 0.156924

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.143298:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.143298:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.142569:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.142569:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.139687:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.139687:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.145907:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.145907:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.143397:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.143397:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.140204:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.140204:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.139755:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.139755:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.142501:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.142501:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.147777:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.147777:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.144948:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.144948:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.144948:  91%|######### | 10/11 [00:00<00:00, 99.79it/s]
iteration: 0010, losses: 0.122314:  91%|######### | 10/11 [00:00<00:00, 99.79it/s]
iteration: 0010, losses: 0.122314:  91%|######### | 10/11 [00:00<00:00, 99.79it/s]
iteration: 0010, losses: 0.122314: 100%|##########| 11/11 [00:00<00:00, 90.05it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 361.01it/s]
the running loss of the test set 0.156898
train loss: 0.155236 test loss: 0.156898

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.138391:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.138391:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.143908:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.143908:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.145271:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.145271:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.144318:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.144318:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.144876:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.144876:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.139595:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.139595:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.143172:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.143172:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.139433:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.139433:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.144985:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.144985:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.141782:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.141782:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.141782:  91%|######### | 10/11 [00:00<00:00, 90.38it/s]
iteration: 0010, losses: 0.128100:  91%|######### | 10/11 [00:00<00:00, 90.38it/s]
iteration: 0010, losses: 0.128100:  91%|######### | 10/11 [00:00<00:00, 90.38it/s]
iteration: 0010, losses: 0.128100: 100%|##########| 11/11 [00:00<00:00, 93.19it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 184.12it/s]
the running loss of the test set 0.156872
train loss: 0.155383 test loss: 0.156872

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.140359:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.140359:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.146264:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.146264:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.139848:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.139848:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.146916:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.146916:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.140451:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.140451:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.147816:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.147816:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.141634:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.141634:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.140868:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.140868:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.140868:  73%|#######2  | 8/11 [00:00<00:00, 77.73it/s]
iteration: 0008, losses: 0.140791:  73%|#######2  | 8/11 [00:00<00:00, 77.73it/s]
iteration: 0008, losses: 0.140791:  73%|#######2  | 8/11 [00:00<00:00, 77.73it/s]
iteration: 0009, losses: 0.140614:  73%|#######2  | 8/11 [00:00<00:00, 77.73it/s]
iteration: 0009, losses: 0.140614:  73%|#######2  | 8/11 [00:00<00:00, 77.73it/s]
iteration: 0010, losses: 0.127336:  73%|#######2  | 8/11 [00:00<00:00, 77.73it/s]
iteration: 0010, losses: 0.127336:  73%|#######2  | 8/11 [00:00<00:00, 77.73it/s]
iteration: 0010, losses: 0.127336: 100%|##########| 11/11 [00:00<00:00, 89.48it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 237.04it/s]
the running loss of the test set 0.156846
train loss: 0.155290 test loss: 0.156846

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.142313:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.142313:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.142720:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.142720:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.147243:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.147243:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.146658:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.146658:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.145578:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.145578:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.142515:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.142515:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.139686:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.139686:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.140816:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.140816:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.139272:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.139272:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.141411:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.141411:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.141411:  91%|######### | 10/11 [00:00<00:00, 92.49it/s]
iteration: 0010, losses: 0.123899:  91%|######### | 10/11 [00:00<00:00, 92.49it/s]
iteration: 0010, losses: 0.123899:  91%|######### | 10/11 [00:00<00:00, 92.49it/s]
iteration: 0010, losses: 0.123899: 100%|##########| 11/11 [00:00<00:00, 84.40it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 367.79it/s]
the running loss of the test set 0.156820
train loss: 0.155211 test loss: 0.156820

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.144096:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.144096:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.140119:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.140119:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.141705:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.141705:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.139409:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.139409:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.144709:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.144709:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.146477:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.146477:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.146355:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.146355:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.140837:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.140837:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.144370:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.144370:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.144370:  82%|########1 | 9/11 [00:00<00:00, 85.23it/s]
iteration: 0009, losses: 0.140524:  82%|########1 | 9/11 [00:00<00:00, 85.23it/s]
iteration: 0009, losses: 0.140524:  82%|########1 | 9/11 [00:00<00:00, 85.23it/s]
iteration: 0010, losses: 0.123116:  82%|########1 | 9/11 [00:00<00:00, 85.23it/s]
iteration: 0010, losses: 0.123116:  82%|########1 | 9/11 [00:00<00:00, 85.23it/s]
iteration: 0010, losses: 0.123116: 100%|##########| 11/11 [00:00<00:00, 90.23it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 173.78it/s]
the running loss of the test set 0.156794
train loss: 0.155172 test loss: 0.156794

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.139456:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.139456:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.140512:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.140512:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.145245:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.145245:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.142591:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.142591:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.141521:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.141521:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.142258:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.142258:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.143283:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.143283:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.146888:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.146888:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.142430:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.142430:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.144698:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.144698:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.144698:  91%|######### | 10/11 [00:00<00:00, 89.89it/s]
iteration: 0010, losses: 0.122146:  91%|######### | 10/11 [00:00<00:00, 89.89it/s]
iteration: 0010, losses: 0.122146:  91%|######### | 10/11 [00:00<00:00, 89.89it/s]
iteration: 0010, losses: 0.122146: 100%|##########| 11/11 [00:00<00:00, 92.38it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 182.04it/s]
the running loss of the test set 0.156768
train loss: 0.155103 test loss: 0.156768

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.146615:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.146615:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.144148:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.144148:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.142476:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.142476:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.139545:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.139545:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.139770:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.139770:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.142781:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.142781:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.144287:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.144287:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.138637:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.138637:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.142690:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.142690:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.143906:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.143906:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.127747:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.127747:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.127747: 100%|##########| 11/11 [00:00<00:00, 105.93it/s]
iteration: 0010, losses: 0.127747: 100%|##########| 11/11 [00:00<00:00, 105.46it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 242.77it/s]
the running loss of the test set 0.156742
train loss: 0.155260 test loss: 0.156742

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.142362:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.142362:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.141604:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.141604:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.141521:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.141521:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.145272:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.145272:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.144773:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.144773:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.142376:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.142376:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.139677:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.139677:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.139017:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.139017:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.142877:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.142877:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.148054:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.148054:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.148054:  91%|######### | 10/11 [00:00<00:00, 90.59it/s]
iteration: 0010, losses: 0.123064:  91%|######### | 10/11 [00:00<00:00, 90.59it/s]
iteration: 0010, losses: 0.123064:  91%|######### | 10/11 [00:00<00:00, 90.59it/s]
iteration: 0010, losses: 0.123064: 100%|##########| 11/11 [00:00<00:00, 84.10it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 240.94it/s]
the running loss of the test set 0.156717
train loss: 0.155060 test loss: 0.156717

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.140234:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.140234:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.145463:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.145463:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.143963:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.143963:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.142180:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.142180:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.146149:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.146149:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.142469:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.142469:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.142158:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.142158:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.143977:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.143977:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.139005:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.139005:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.139182:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.139182:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.139182:  91%|######### | 10/11 [00:00<00:00, 90.54it/s]
iteration: 0010, losses: 0.126859:  91%|######### | 10/11 [00:00<00:00, 90.54it/s]
iteration: 0010, losses: 0.126859:  91%|######### | 10/11 [00:00<00:00, 90.54it/s]
iteration: 0010, losses: 0.126859: 100%|##########| 11/11 [00:00<00:00, 94.14it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 181.07it/s]
the running loss of the test set 0.156691
train loss: 0.155164 test loss: 0.156691

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.144552:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.144552:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.138843:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.138843:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.140744:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.140744:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.145039:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.145039:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.142718:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.142718:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.140338:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.140338:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.140643:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.140643:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.144302:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.144302:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.143553:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.143553:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.146815:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.146815:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.146815:  91%|######### | 10/11 [00:00<00:00, 99.96it/s]
iteration: 0010, losses: 0.123069:  91%|######### | 10/11 [00:00<00:00, 99.96it/s]
iteration: 0010, losses: 0.123069:  91%|######### | 10/11 [00:00<00:00, 99.96it/s]
iteration: 0010, losses: 0.123069: 100%|##########| 11/11 [00:00<00:00, 102.27it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 244.39it/s]
the running loss of the test set 0.156665
train loss: 0.155062 test loss: 0.156665

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.142831:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.142831:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.141221:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.141221:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.139458:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.139458:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.142781:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.142781:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.145877:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.145877:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.139826:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.139826:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.151069:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.151069:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.139900:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.139900:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.141930:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.141930:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.141930:  82%|########1 | 9/11 [00:00<00:00, 74.81it/s]
iteration: 0009, losses: 0.139799:  82%|########1 | 9/11 [00:00<00:00, 74.81it/s]
iteration: 0009, losses: 0.139799:  82%|########1 | 9/11 [00:00<00:00, 74.81it/s]
iteration: 0010, losses: 0.126165:  82%|########1 | 9/11 [00:00<00:00, 74.81it/s]
iteration: 0010, losses: 0.126165:  82%|########1 | 9/11 [00:00<00:00, 74.81it/s]
iteration: 0010, losses: 0.126165: 100%|##########| 11/11 [00:00<00:00, 81.85it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 186.77it/s]
the running loss of the test set 0.156640
train loss: 0.155086 test loss: 0.156640

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.138191:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.138191:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.144470:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.144470:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.147397:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.147397:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.142734:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.142734:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.144386:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.144386:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.140965:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.140965:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.143242:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.143242:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.140470:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.140470:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.141175:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.141175:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.139046:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.139046:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.139046:  91%|######### | 10/11 [00:00<00:00, 99.34it/s]
iteration: 0010, losses: 0.129284:  91%|######### | 10/11 [00:00<00:00, 99.34it/s]
iteration: 0010, losses: 0.129284:  91%|######### | 10/11 [00:00<00:00, 99.34it/s]
iteration: 0010, losses: 0.129284: 100%|##########| 11/11 [00:00<00:00, 90.94it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 345.59it/s]
the running loss of the test set 0.156615
train loss: 0.155136 test loss: 0.156615

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.137193:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.137193:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.142340:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.142340:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.141623:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.141623:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.143278:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.143278:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.146228:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.146228:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.140404:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.140404:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.139673:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.139673:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.143321:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.143321:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.148046:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.148046:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.148046:  82%|########1 | 9/11 [00:00<00:00, 87.22it/s]
iteration: 0009, losses: 0.143199:  82%|########1 | 9/11 [00:00<00:00, 87.22it/s]
iteration: 0009, losses: 0.143199:  82%|########1 | 9/11 [00:00<00:00, 87.22it/s]
iteration: 0010, losses: 0.124538:  82%|########1 | 9/11 [00:00<00:00, 87.22it/s]
iteration: 0010, losses: 0.124538:  82%|########1 | 9/11 [00:00<00:00, 87.22it/s]
iteration: 0010, losses: 0.124538: 100%|##########| 11/11 [00:00<00:00, 94.82it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 182.03it/s]
the running loss of the test set 0.156589
train loss: 0.154984 test loss: 0.156589

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.145031:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.145031:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.141680:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.141680:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.143476:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.143476:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.141484:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.141484:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.140400:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.140400:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.144918:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.144918:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.142691:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.142691:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.139210:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.139210:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.142612:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.142612:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.139167:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.139167:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.139167:  91%|######### | 10/11 [00:00<00:00, 87.51it/s]
iteration: 0010, losses: 0.128975:  91%|######### | 10/11 [00:00<00:00, 87.51it/s]
iteration: 0010, losses: 0.128975:  91%|######### | 10/11 [00:00<00:00, 87.51it/s]
iteration: 0010, losses: 0.128975: 100%|##########| 11/11 [00:00<00:00, 90.61it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 244.43it/s]
the running loss of the test set 0.156564
train loss: 0.154964 test loss: 0.156564

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.142268:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.142268:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.140772:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.140772:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.141125:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.141125:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.142406:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.142406:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.143901:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.143901:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.142930:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.142930:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.141440:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.141440:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.139231:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.139231:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.148302:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.148302:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.148302:  82%|########1 | 9/11 [00:00<00:00, 76.47it/s]
iteration: 0009, losses: 0.141702:  82%|########1 | 9/11 [00:00<00:00, 76.47it/s]
iteration: 0009, losses: 0.141702:  82%|########1 | 9/11 [00:00<00:00, 76.47it/s]
iteration: 0010, losses: 0.125565:  82%|########1 | 9/11 [00:00<00:00, 76.47it/s]
iteration: 0010, losses: 0.125565:  82%|########1 | 9/11 [00:00<00:00, 76.47it/s]
iteration: 0010, losses: 0.125565: 100%|##########| 11/11 [00:00<00:00, 80.81it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 247.79it/s]
the running loss of the test set 0.156539
train loss: 0.154964 test loss: 0.156539

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.140121:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.140121:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.144651:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.144651:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.142276:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.142276:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.139870:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.139870:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.139926:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.139926:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.141715:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.141715:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.141871:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.141871:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.142738:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.142738:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.141302:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.141302:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.149852:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.149852:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.125307:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.125307:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.125307: 100%|##########| 11/11 [00:00<00:00, 101.21it/s]
iteration: 0010, losses: 0.125307: 100%|##########| 11/11 [00:00<00:00, 101.04it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 196.49it/s]
the running loss of the test set 0.156514
train loss: 0.154963 test loss: 0.156514

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.141939:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.141939:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.143197:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.143197:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.143704:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.143704:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.138993:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.138993:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.147618:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.147618:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.144145:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.144145:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.141611:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.141611:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.143200:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.143200:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.143200:  73%|#######2  | 8/11 [00:00<00:00, 75.48it/s]
iteration: 0008, losses: 0.141074:  73%|#######2  | 8/11 [00:00<00:00, 75.48it/s]
iteration: 0008, losses: 0.141074:  73%|#######2  | 8/11 [00:00<00:00, 75.48it/s]
iteration: 0009, losses: 0.138968:  73%|#######2  | 8/11 [00:00<00:00, 75.48it/s]
iteration: 0009, losses: 0.138968:  73%|#######2  | 8/11 [00:00<00:00, 75.48it/s]
iteration: 0010, losses: 0.124718:  73%|#######2  | 8/11 [00:00<00:00, 75.48it/s]
iteration: 0010, losses: 0.124718:  73%|#######2  | 8/11 [00:00<00:00, 75.48it/s]
iteration: 0010, losses: 0.124718: 100%|##########| 11/11 [00:00<00:00, 87.08it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 173.40it/s]
the running loss of the test set 0.156489
train loss: 0.154917 test loss: 0.156489

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.142792:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.142792:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.140555:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.140555:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.145629:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.145629:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.143636:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.143636:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.141360:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.141360:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.140241:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.140241:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.141294:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.141294:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.143366:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.143366:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.142842:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.142842:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.140052:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.140052:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.140052:  91%|######### | 10/11 [00:00<00:00, 91.26it/s]
iteration: 0010, losses: 0.128252:  91%|######### | 10/11 [00:00<00:00, 91.26it/s]
iteration: 0010, losses: 0.128252:  91%|######### | 10/11 [00:00<00:00, 91.26it/s]
iteration: 0010, losses: 0.128252: 100%|##########| 11/11 [00:00<00:00, 94.23it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 189.15it/s]
the running loss of the test set 0.156464
train loss: 0.155002 test loss: 0.156464

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.143200:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.143200:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.146353:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.146353:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.140691:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.140691:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.150849:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.150849:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.140756:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.140756:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.140687:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.140687:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.137179:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.137179:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.141137:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.141137:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.141138:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.141138:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.141393:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.141393:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.125297:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.125297:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.125297: 100%|##########| 11/11 [00:00<00:00, 117.14it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 183.31it/s]
the running loss of the test set 0.156439
train loss: 0.154868 test loss: 0.156439

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.145751:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.145751:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.144352:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.144352:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.141002:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.141002:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.143451:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.143451:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.141827:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.141827:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.140554:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.140554:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.138917:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.138917:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.140784:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.140784:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.141594:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.141594:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.141594:  82%|########1 | 9/11 [00:00<00:00, 87.65it/s]
iteration: 0009, losses: 0.143491:  82%|########1 | 9/11 [00:00<00:00, 87.65it/s]
iteration: 0009, losses: 0.143491:  82%|########1 | 9/11 [00:00<00:00, 87.65it/s]
iteration: 0010, losses: 0.127417:  82%|########1 | 9/11 [00:00<00:00, 87.65it/s]
iteration: 0010, losses: 0.127417:  82%|########1 | 9/11 [00:00<00:00, 87.65it/s]
iteration: 0010, losses: 0.127417: 100%|##########| 11/11 [00:00<00:00, 84.50it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 256.29it/s]
the running loss of the test set 0.156414
train loss: 0.154914 test loss: 0.156414

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.142817:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.142817:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.141356:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.141356:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.141511:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.141511:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.140144:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.140144:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.143129:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.143129:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.145268:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.145268:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.139510:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.139510:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.139877:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.139877:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.140989:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.140989:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.148058:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.148058:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.125760:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.125760:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.125760: 100%|##########| 11/11 [00:00<00:00, 109.95it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 247.89it/s]
the running loss of the test set 0.156389
train loss: 0.154842 test loss: 0.156389

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.149916:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.149916:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.140627:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.140627:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.142532:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.142532:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.139766:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.139766:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.144058:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.144058:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.139028:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.139028:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.141583:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.141583:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.138141:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.138141:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.138141:  73%|#######2  | 8/11 [00:00<00:00, 71.91it/s]
iteration: 0008, losses: 0.145502:  73%|#######2  | 8/11 [00:00<00:00, 71.91it/s]
iteration: 0008, losses: 0.145502:  73%|#######2  | 8/11 [00:00<00:00, 71.91it/s]
iteration: 0009, losses: 0.142049:  73%|#######2  | 8/11 [00:00<00:00, 71.91it/s]
iteration: 0009, losses: 0.142049:  73%|#######2  | 8/11 [00:00<00:00, 71.91it/s]
iteration: 0010, losses: 0.124988:  73%|#######2  | 8/11 [00:00<00:00, 71.91it/s]
iteration: 0010, losses: 0.124988:  73%|#######2  | 8/11 [00:00<00:00, 71.91it/s]
iteration: 0010, losses: 0.124988: 100%|##########| 11/11 [00:00<00:00, 82.95it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 368.08it/s]
the running loss of the test set 0.156365
train loss: 0.154819 test loss: 0.156365

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.141000:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.141000:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.143552:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.143552:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.141375:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.141375:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.139040:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.139040:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.138759:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.138759:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.144926:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.144926:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.140041:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.140041:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.142207:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.142207:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.143647:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.143647:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.143647:  82%|########1 | 9/11 [00:00<00:00, 75.78it/s]
iteration: 0009, losses: 0.145143:  82%|########1 | 9/11 [00:00<00:00, 75.78it/s]
iteration: 0009, losses: 0.145143:  82%|########1 | 9/11 [00:00<00:00, 75.78it/s]
iteration: 0010, losses: 0.129116:  82%|########1 | 9/11 [00:00<00:00, 75.78it/s]
iteration: 0010, losses: 0.129116:  82%|########1 | 9/11 [00:00<00:00, 75.78it/s]
iteration: 0010, losses: 0.129116: 100%|##########| 11/11 [00:00<00:00, 82.83it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 185.74it/s]
the running loss of the test set 0.156340
train loss: 0.154881 test loss: 0.156340

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.149670:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.149670:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.142756:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.142756:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.142054:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.142054:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.140090:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.140090:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.138371:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.138371:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.143509:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.143509:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.140132:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.140132:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.142689:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.142689:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.139823:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.139823:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.143246:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.143246:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.125144:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.125144:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0010, losses: 0.125144: 100%|##########| 11/11 [00:00<00:00, 107.01it/s]
iteration: 0010, losses: 0.125144: 100%|##########| 11/11 [00:00<00:00, 106.83it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 252.25it/s]
the running loss of the test set 0.156315
train loss: 0.154748 test loss: 0.156315

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.141241:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.141241:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.143637:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.143637:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.146092:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.146092:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.139852:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.139852:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.142026:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.142026:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.139353:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.139353:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.144869:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.144869:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.140223:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.140223:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.143494:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.143494:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.140147:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.140147:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.140147:  91%|######### | 10/11 [00:00<00:00, 92.52it/s]
iteration: 0010, losses: 0.126290:  91%|######### | 10/11 [00:00<00:00, 92.52it/s]
iteration: 0010, losses: 0.126290:  91%|######### | 10/11 [00:00<00:00, 92.52it/s]
iteration: 0010, losses: 0.126290: 100%|##########| 11/11 [00:00<00:00, 85.71it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 352.86it/s]
the running loss of the test set 0.156291
train loss: 0.154722 test loss: 0.156291

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.144045:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.144045:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.143631:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.143631:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.142359:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.142359:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.139266:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.139266:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.140633:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.140633:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.137678:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.137678:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.146234:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.146234:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.145070:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.145070:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.141203:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.141203:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.141203:  82%|########1 | 9/11 [00:00<00:00, 87.96it/s]
iteration: 0009, losses: 0.140986:  82%|########1 | 9/11 [00:00<00:00, 87.96it/s]
iteration: 0009, losses: 0.140986:  82%|########1 | 9/11 [00:00<00:00, 87.96it/s]
iteration: 0010, losses: 0.125854:  82%|########1 | 9/11 [00:00<00:00, 87.96it/s]
iteration: 0010, losses: 0.125854:  82%|########1 | 9/11 [00:00<00:00, 87.96it/s]
iteration: 0010, losses: 0.125854: 100%|##########| 11/11 [00:00<00:00, 94.67it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 187.32it/s]
the running loss of the test set 0.156266
train loss: 0.154696 test loss: 0.156266

  0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.140091:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0000, losses: 0.140091:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.142038:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0001, losses: 0.142038:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.140866:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0002, losses: 0.140866:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.141818:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0003, losses: 0.141818:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.139732:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0004, losses: 0.139732:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.147713:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0005, losses: 0.147713:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.146406:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0006, losses: 0.146406:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.142732:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0007, losses: 0.142732:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.140817:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0008, losses: 0.140817:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.141663:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.141663:   0%|          | 0/11 [00:00<?, ?it/s]
iteration: 0009, losses: 0.141663:  91%|######### | 10/11 [00:00<00:00, 94.50it/s]
iteration: 0010, losses: 0.122177:  91%|######### | 10/11 [00:00<00:00, 94.50it/s]
iteration: 0010, losses: 0.122177:  91%|######### | 10/11 [00:00<00:00, 94.50it/s]
iteration: 0010, losses: 0.122177: 100%|##########| 11/11 [00:00<00:00, 97.67it/s]

  0%|          | 0/11 [00:00<?, ?it/s]
100%|##########| 11/11 [00:00<00:00, 248.67it/s]
the running loss of the test set 0.156242
train loss: 0.154605 test loss: 0.156242

And that’s it. We’are done with our IMUCorrecter tutorials. Thanks for reading.

Total running time of the script: ( 0 minutes 17.385 seconds)

Gallery generated by Sphinx-Gallery

Docs

Access documentation for PyPose

View Docs

Tutorials

Get started with tutorials and examples

View Tutorials

Get Started

Find resources and how to start using pypose

View Resources