Questo documento spiega come sono state calcolate le statistiche riguardanti la valutazione zero shot del modello Kosmos2 su vari dataset.

Alla fine della fase di evaluation, ho ottenuto un file chiamato “zero_shot_final.csv”. Questo file è una tabella di questo tipo:

environmententity_typelexical_referencesimage_bboximage_normalbounding_boxkosmos_bounding_boxoverlap_indexMatch
2416Painting[‘quadro’]Robocup/2416/images/LivingRoom/bounding_box/position_0/2416_LivingRoom_bounding_box_pos_0_180.jpgRobocup/2416/images/LivingRoom/normal/position_0/2416_LivingRoom_pos_0_180.jpg(0.24666666666666667, 0.2733333333333333, 0.8933333333333333, 0.5966666666666667)(0.234375, 0.265625, 0.890625, 0.609375)0.9194192970203593True
2644Painting[‘quadro’]Simpleset/2644/images/LivingRoom/bounding_box/position_5/2644_LivingRoom_bounding_box_pos_5_90.jpgSimpleset/2644/images/LivingRoom/normal/position_5/2644_LivingRoom_pos_5_90.jpg(0.8866666666666667, 0.42, 0.9983333333333333, 0.755)(0.015625, 0.015625, 0.359375, 0.703125)0.0False
2746Dining Table[‘tavolo_da_pranzo’]S4R/2746/images/LivingRoom/bounding_box/position_1/2746_LivingRoom_bounding_box_pos_1_270.jpgS4R/2746/images/LivingRoom/normal/position_1/2746_LivingRoom_pos_1_270.jpg(0.31333333333333335, 0.29, 0.385, 0.3616666666666667)(0.015625, 0.015625, 0.328125, 0.859375)0.003959207069253051False
2684Arm Chair[‘poltrona’]S4R/2684/images/LivingRoom/bounding_box/position_3/2684_LivingRoom_bounding_box_pos_3_0.jpgS4R/2684/images/LivingRoom/normal/position_3/2684_LivingRoom_pos_3_0.jpg(0.36333333333333334, 0.315, 0.4583333333333333, 0.40166666666666667)(0.359375, 0.296875, 0.453125, 0.421875)0.6394293865905849True
2279Painting[‘quadro’]Robocup/2279/images/LivingRoom/bounding_box/position_0/2279_LivingRoom_bounding_box_pos_0_0.jpgRobocup/2279/images/LivingRoom/normal/position_0/2279_LivingRoom_pos_0_0.jpg(0.5533333333333333, 0.07, 0.7283333333333334, 0.20333333333333334)(0.546875, 0.078125, 0.734375, 0.203125)0.8786610878661091True
3353Floor Lamp[‘lampada_da_terra’]Rockin2/3353/images/LivingRoom/bounding_box/position_0/3353_LivingRoom_bounding_box_pos_0_0.jpgRockin2/3353/images/LivingRoom/normal/position_0/3353_LivingRoom_pos_0_0.jpg(0.41, 0.26166666666666666, 0.5066666666666667, 0.4716666666666667)(0.265625, 0.421875, 0.609375, 0.890625)0.02725175434888814False
3385Garbage Can[‘pattumiera’]Rockin2/3385/images/LivingRoom/bounding_box/position_3/3385_LivingRoom_bounding_box_pos_3_180.jpgRockin2/3385/images/LivingRoom/normal/position_3/3385_LivingRoom_pos_3_180.jpg(0.5366666666666666, 0.605, 0.7216666666666667, 0.7733333333333333)(0.515625, 0.609375, 0.734375, 0.796875)0.725219167164774True
3068Chair[‘sedia’]Rockin1/3068/images/LivingRoom/bounding_box/position_2/3068_LivingRoom_bounding_box_pos_2_90.jpgRockin1/3068/images/LivingRoom/normal/position_2/3068_LivingRoom_pos_2_90.jpg(0.47833333333333333, 0.33666666666666667, 0.5533333333333333, 0.56)(0.390625, 0.390625, 0.984375, 0.703125)0.06700181308719304False
Dove sostanzialmente è contenuta: l’entità, l’immagine presa in considerazione, il boundig box target (preso come ground truth) , il bounding box generato dal modello, l’overlapping index, ed un valore booleano che indica se il modello è riuscito a trovare l’immagine o meno.

A questo punto ho computato una serie di statistiche per ogni tipo diverso di entità:

Il numero di occorrenze per ogni entità:

entity_typeNumber of Occurrences
Total10000
Painting751
Cell Phone628
Remote Control518
Book485
Chair443
Pen356
Dining Table317
Box214
Key Chain207
Counter Top204
Bowl194
Bottle192
House Plant192
Television185
Statue169
Plate165
Sofa161
Laptop161
Fridge157
Knife150
Bed141
Dresser141
Wine Bottle138
Garbage Can135
Fork129
Spoon128
Pillow125
Mug119
Arm Chair114
Bread114
Spray Bottle113
Vase110
Soap Bottle106
Spatula94
Pencil91
Toaster89
Shelving Unit88
Toilet86
Kettle84
TV Stand84
Butter Knife83
Newspaper76
Apple75
Cup73
Washing Machine73
Side Table72
Candle70
Sink64
Floor Lamp64
Credit Card60
Pepper Shaker58
Potato57
Salt Shaker56
Tomato56
Stool55
Pan54
Garbage Bag54
Faucet54
Dish Sponge52
Lettuce51
Microwave46
Toilet Paper46
Watch43
Teddy Bear43
Paper Towel Roll38
Desk Lamp37
Plunger37
Basket Ball35
Pot35
Dog Bed34
Ladle34
Baseball Bat33
Cart32
Tissue Box26
Egg23
Alarm Clock22
Desk17
Coffee Machine14
Soap Bar13
Tennis Racket11
Safe11
Cloth10
Laundry Hamper9
Vacuum Cleaner7
Boots3
Desktop2
Room Decor2
Table Top Decor1
Ottoman1
Questo indica quante volte ognuna di queste entità è apparsa nei dati che abbiamo valutato
# Group the DataFrame by 'entity_type'
grouped = df.groupby('entity_type')
 
# Iterate over each entity type
for entity_type, group in grouped:
    # Calculate statistics for the current entity type
    total_matches = group['Match'].sum()

La percentuale di istanze corrette

Per ogni tipo di entità diverso, calcola la percentuale di istanze che il modello ha determinato correttamente

entity_typePercentage of Matches
Total22.2700
Painting40.3462
Cell Phone2.0701
Remote Control1.7375
Book8.8660
Chair25.0564
Pen0.0000
Dining Table58.6751
Box10.2804
Key Chain0.4831
Counter Top37.7451
Bowl5.6701
Bottle6.7708
House Plant46.3542
Television63.7838
Statue20.7101
Plate4.2424
Sofa65.8385
Laptop16.7702
Fridge66.2420
Knife2.6667
Bed52.4823
Dresser61.7021
Wine Bottle13.7681
Garbage Can66.6667
Fork0.0000
Spoon0.0000
Pillow15.2000
Mug4.2017
Arm Chair54.3860
Bread7.8947
Spray Bottle15.0442
Vase8.1818
Soap Bottle6.6038
Spatula0.0000
Pencil1.0989
Toaster12.3596
Shelving Unit61.3636
Toilet66.2791
Kettle5.9524
TV Stand35.7143
Butter Knife0.0000
Newspaper5.2632
Apple2.6667
Cup4.1096
Washing Machine69.8630
Side Table40.2778
Candle1.4286
Sink71.8750
Floor Lamp59.3750
Credit Card0.0000
Pepper Shaker0.0000
Potato5.2632
Salt Shaker0.0000
Tomato8.9286
Stool56.3636
Pan7.4074
Garbage Bag68.5185
Faucet11.1111
Dish Sponge0.0000
Lettuce11.7647
Microwave13.0435
Toilet Paper4.3478
Watch0.0000
Teddy Bear48.8372
Paper Towel Roll7.8947
Desk Lamp8.1081
Plunger24.3243
Basket Ball31.4286
Pot8.5714
Dog Bed44.1176
Ladle2.9412
Baseball Bat15.1515
Cart46.8750
Tissue Box3.8462
Egg0.0000
Alarm Clock18.1818
Desk47.0588
Coffee Machine35.7143
Soap Bar0.0000
Tennis Racket9.0909
Safe36.3636
Cloth0.0000
Laundry Hamper44.4444
Vacuum Cleaner57.1429
Boots0.0000
Desktop0.0000
Room Decor0.0000
Table Top Decor0.0000
Ottoman100.0000
    std_total_matches = group['Match'].std()
    # Calculate percentage of times there's a match
    if total_instances > 0:
        percentage_match = (total_matches / total_instances) * 100
    else:
        percentage_match = 0

Overlapping index medio

entity_typeAverage Overlapping IndexAverage Overlapping Index (Matched)Average Overlapping Index (Unmatched)
Total0.17840.76870.0093
Painting0.32070.78960.0037
Cell Phone0.01690.66530.0032
Remote Control0.01530.63650.0043
Book0.06930.70460.0075
Chair0.20520.73850.0269
Pen0.00110.0011
Dining Table0.46690.77490.0296
Box0.08490.70230.0142
Key Chain0.00430.61050.0013
Counter Top0.31000.80140.0121
Bowl0.04280.63600.0071
Bottle0.04750.64710.0040
House Plant0.34120.71550.0178
Television0.49310.76770.0095
Statue0.14700.65600.0140
Plate0.03340.70490.0036
Sofa0.55800.82740.0389
Laptop0.13200.72590.0123
Fridge0.56150.84490.0053
Knife0.02270.70090.0042
Bed0.43950.81910.0202
Dresser0.53990.85480.0326
Wine Bottle0.09860.65250.0102
Garbage Can0.52330.76660.0367
Fork0.00190.0019
Spoon0.00140.0014
Pillow0.11680.68730.0146
Mug0.03100.62050.0052
Arm Chair0.44040.79650.0159
Bread0.06320.69820.0088
Spray Bottle0.10570.66530.0066
Vase0.06390.65180.0115
Soap Bottle0.04660.62850.0054
Spatula0.00580.0058
Pencil0.00700.57010.0007
Toaster0.09120.68160.0080
Shelving Unit0.52590.83040.0421
Toilet0.51220.74580.0529
Kettle0.04750.64700.0096
TV Stand0.28360.75250.0230
Butter Knife0.00450.0045
Newspaper0.03920.67690.0037
Apple0.02230.65650.0050
Cup0.03060.63110.0049
Washing Machine0.59040.82560.0452
Side Table0.32040.78350.0081
Candle0.02160.89070.0090
Sink0.56340.77410.0251
Floor Lamp0.48510.79900.0264
Credit Card0.00100.0010
Pepper Shaker0.00200.0020
Potato0.03490.64140.0012
Salt Shaker0.00100.0010
Tomato0.05810.63450.0016
Stool0.43020.75950.0049
Pan0.05310.63680.0064
Garbage Bag0.52450.75090.0318
Faucet0.10330.72180.0260
Dish Sponge0.00450.0045
Lettuce0.08110.68540.0006
Microwave0.12580.76650.0297
Toilet Paper0.03210.65790.0037
Watch0.00090.0009
Teddy Bear0.36050.69450.0417
Paper Towel Roll0.05120.59870.0043
Desk Lamp0.08210.72180.0257
Plunger0.18270.69900.0168
Basket Ball0.25730.63200.0856
Pot0.06370.70740.0033
Dog Bed0.36210.82050.0003
Ladle0.02480.71960.0038
Baseball Bat0.11960.74970.0071
Cart0.37850.80610.0012
Tissue Box0.03570.72970.0079
Egg0.00150.0015
Alarm Clock0.12600.66870.0055
Desk0.40790.84460.0197
Coffee Machine0.23540.62600.0183
Soap Bar0.00100.0010
Tennis Racket0.07180.56450.0226
Safe0.27190.73000.0101
Cloth0.00000.0000
Laundry Hamper0.35250.78750.0044
Vacuum Cleaner0.43800.76640.0000
Boots0.00680.0068
Desktop0.04080.0408
Room Decor0.01230.0123
Table Top Decor0.05580.0558
Ottoman0.90660.9066
    average_overlap_index_all = df['overlap_index'].mean()
    std_average_overlap_index_matched = group[group['Match']]['overlap_index'].std()
    std_average_overlap_index_unmatched = group[~group['Match']]['overlap_index'].std()

Dimensione media dei bounding box

entity_typeAvg BBox Dimensions (Correct)Avg BBox Dimensions (Incorrect)Average BBox Dimensions (All)
Total0.07630.00670.0222
Painting0.06720.01410.0355
Cell Phone0.00630.00100.0011
Remote Control0.00980.00080.0010
Book0.01950.00280.0043
Chair0.04440.01130.0196
Pen0.00040.0004
Dining Table0.07320.02890.0549
Box0.01640.00510.0062
Key Chain0.00550.00030.0004
Counter Top0.18640.06790.1126
Bowl0.00980.00290.0033
Bottle0.01050.00170.0023
House Plant0.03200.00450.0172
Television0.10150.01910.0716
Statue0.01650.00320.0059
Plate0.01530.00120.0018
Sofa0.10500.02360.0772
Laptop0.02710.00470.0084
Fridge0.20560.06520.1582
Knife0.01390.00150.0018
Bed0.14100.02710.0869
Dresser0.12220.01980.0830
Wine Bottle0.00930.00540.0059
Garbage Can0.02740.01020.0217
Fork0.00070.0007
Spoon0.00040.0004
Pillow0.02000.00280.0054
Mug0.01160.00150.0019
Arm Chair0.05970.02060.0419
Bread0.01550.00180.0029
Spray Bottle0.01100.00120.0027
Vase0.02060.00150.0031
Soap Bottle0.01010.00250.0030
Spatula0.00200.0020
Pencil0.00280.00030.0003
Toaster0.01810.00420.0060
Shelving Unit0.14460.03680.1030
Toilet0.08350.01320.0598
Kettle0.01280.00390.0044
TV Stand0.11230.02130.0538
Butter Knife0.00020.0002
Newspaper0.02250.00130.0024
Apple0.00370.00190.0019
Cup0.00720.00150.0017
Washing Machine0.09820.02600.0764
Side Table0.05570.01720.0327
Candle0.00610.00070.0008
Sink0.07720.01750.0604
Floor Lamp0.12110.02730.0830
Credit Card0.00030.0003
Pepper Shaker0.00190.0019
Potato0.00390.00200.0021
Salt Shaker0.00050.0005
Tomato0.00440.00050.0008
Stool0.02570.00810.0181
Pan0.01660.00390.0048
Garbage Bag0.02180.00550.0167
Faucet0.02270.01070.0120
Dish Sponge0.00060.0006
Lettuce0.01040.00070.0019
Microwave0.04310.00890.0133
Toilet Paper0.00640.00110.0013
Watch0.00030.0003
Teddy Bear0.01520.00270.0088
Paper Towel Roll0.00720.00210.0025
Desk Lamp0.02690.00370.0056
Plunger0.01210.00420.0061
Basket Ball0.00830.00180.0038
Pot0.02480.00310.0049
Dog Bed0.04340.01460.0273
Ladle0.00950.00110.0013
Baseball Bat0.02290.00360.0065
Cart0.07810.03620.0559
Tissue Box0.00870.00240.0026
Egg0.00050.0005
Alarm Clock0.01030.00340.0046
Desk0.14190.03920.0876
Coffee Machine0.01850.01320.0151
Soap Bar0.00050.0005
Tennis Racket0.00990.00470.0052
Safe0.01630.00550.0094
Cloth0.00260.0026
Laundry Hamper0.05990.01540.0352
Vacuum Cleaner0.03010.01830.0250
Boots0.00110.0011
Desktop0.03160.0316
Room Decor0.01300.0130
Table Top Decor0.00270.0027
Ottoman0.10210.1021
    avg_bbox_correct = group[group['Match']]['bounding_box'].apply(eval).apply(lambda x: (x[2]-x[0])*(x[3]-x[1])).mean()
    avg_bbox_incorrect = group[~group['Match']]['bounding_box'].apply(eval).apply(lambda x: (x[2]-x[0])*(x[3]-x[1])).mean()
    avg_bbox_dimensions = group['bounding_box'].apply(eval).apply(lambda x: (x[2]-x[0])*(x[3]-x[1]))

Standard deviation

Inoltre ho calcolato la deviazione standard per ogni valore

    # Calculate standard deviations
    std_total_matches = group['Match'].std()
    std_average_overlap_index = group['overlap_index'].std()
    std_average_overlap_index_matched = group[group['Match']]['overlap_index'].std()
    std_average_overlap_index_unmatched = group[~group['Match']]['overlap_index'].std()
    std_avg_bbox_correct = group[group['Match']]['bounding_box'].apply(eval).apply(lambda x: (x[2]-x[0])*(x[3]-x[1])).std()
    std_avg_bbox_incorrect = group[~group['Match']]['bounding_box'].apply(eval).apply(lambda x: (x[2]-x[0])*(x[3]-x[1])).std()
    

Ulteriori operazioni

Altre operazioni su questo dataset sono:

  1. Convertire in float i valori
  2. Ordinare per numero di occorrenze
  3. troncare i valori al quarto decimale
# Create a DataFrame from the list of calculated statistics
stats_df = pd.DataFrame(entity_stats)
 
# Convert all numeric columns to float
stats_df = stats_df.apply(pd.to_numeric, errors='ignore')
# Sort the DataFrame by the 'Number of Occurrences' column in descending order
stats_df = stats_df.sort_values(by='Number of Occurrences', ascending=False)
 
# Export the DataFrame to a CSV file
stats_df.to_csv("entity_statistics_with_std_rounded.csv", index=False, float_format='%.4f')

File completo

import pandas as pd
import numpy as np
 
# Read the CSV file into a DataFrame
df = pd.read_csv("zero_shot_final.csv")
 
# Initialize a list to store calculated statistics for each entity type
entity_stats = []
 
# Calculate statistics for the entire dataset
total_instances_all = len(df)
total_matches_all = df['Match'].sum()
average_overlap_index_all = df['overlap_index'].mean()
average_overlap_index_matched_all = df[df['Match']]['overlap_index'].mean()
average_overlap_index_unmatched_all = df[~df['Match']]['overlap_index'].mean()
avg_bbox_correct_all = df[df['Match']]['bounding_box'].apply(eval).apply(lambda x: (x[2]-x[0])*(x[3]-x[1])).mean()
avg_bbox_incorrect_all = df[~df['Match']]['bounding_box'].apply(eval).apply(lambda x: (x[2]-x[0])*(x[3]-x[1])).mean()
 
# Calculate standard deviations
std_total_matches_all = df['Match'].std()
std_average_overlap_index_all = df['overlap_index'].std()
std_average_overlap_index_matched_all = df[df['Match']]['overlap_index'].std()
std_average_overlap_index_unmatched_all = df[~df['Match']]['overlap_index'].std()
std_avg_bbox_correct_all = df[df['Match']]['bounding_box'].apply(eval).apply(lambda x: (x[2]-x[0])*(x[3]-x[1])).std()
std_avg_bbox_incorrect_all = df[~df['Match']]['bounding_box'].apply(eval).apply(lambda x: (x[2]-x[0])*(x[3]-x[1])).std()
 
# Calculate average bounding box dimensions for all instances
avg_bbox_dimensions_all = df['bounding_box'].apply(eval).apply(lambda x: (x[2]-x[0])*(x[3]-x[1]))
average_bbox_dimensions_all = avg_bbox_dimensions_all.mean()
std_avg_bbox_dimensions_all = avg_bbox_dimensions_all.std()
 
# Calculate percentage of times there's a match for the entire dataset
if total_instances_all > 0:
    percentage_match_all = (total_matches_all / total_instances_all) * 100
else:
    percentage_match_all = 0
 
# Append the calculated statistics for the entire dataset to the list
entity_stats.append({
    'entity_type': 'Total',
    'Number of Occurrences': total_instances_all,
    'Percentage of Matches': percentage_match_all,
    'Average Overlapping Index': average_overlap_index_all,
    'Std Average Overlapping Index': std_average_overlap_index_all,
    'Average Overlapping Index (Matched)': average_overlap_index_matched_all,
    'Std Average Overlapping Index (Matched)': std_average_overlap_index_matched_all,
    'Average Overlapping Index (Unmatched)': average_overlap_index_unmatched_all,
    'Std Average Overlapping Index (Unmatched)': std_average_overlap_index_unmatched_all,
    'Avg BBox Dimensions (Correct)': avg_bbox_correct_all,
    'Std Avg BBox Dimensions (Correct)': std_avg_bbox_correct_all,
    'Avg BBox Dimensions (Incorrect)': avg_bbox_incorrect_all,
    'Std Avg BBox Dimensions (Incorrect)': std_avg_bbox_incorrect_all,
    'Average BBox Dimensions (All)': average_bbox_dimensions_all,
    'Std Average BBox Dimensions (All)': std_avg_bbox_dimensions_all
})
 
# Group the DataFrame by 'entity_type'
grouped = df.groupby('entity_type')
 
# Iterate over each entity type
for entity_type, group in grouped:
    # Calculate statistics for the current entity type
    total_instances = len(group)
    total_matches = group['Match'].sum()
    average_overlap_index = group['overlap_index'].mean()
    average_overlap_index_matched = group[group['Match']]['overlap_index'].mean()
    average_overlap_index_unmatched = group[~group['Match']]['overlap_index'].mean()
    avg_bbox_correct = group[group['Match']]['bounding_box'].apply(eval).apply(lambda x: (x[2]-x[0])*(x[3]-x[1])).mean()
    avg_bbox_incorrect = group[~group['Match']]['bounding_box'].apply(eval).apply(lambda x: (x[2]-x[0])*(x[3]-x[1])).mean()
    
    # Calculate standard deviations
    std_total_matches = group['Match'].std()
    std_average_overlap_index = group['overlap_index'].std()
    std_average_overlap_index_matched = group[group['Match']]['overlap_index'].std()
    std_average_overlap_index_unmatched = group[~group['Match']]['overlap_index'].std()
    std_avg_bbox_correct = group[group['Match']]['bounding_box'].apply(eval).apply(lambda x: (x[2]-x[0])*(x[3]-x[1])).std()
    std_avg_bbox_incorrect = group[~group['Match']]['bounding_box'].apply(eval).apply(lambda x: (x[2]-x[0])*(x[3]-x[1])).std()
    
    # Calculate average bounding box dimensions for current entity type
    avg_bbox_dimensions = group['bounding_box'].apply(eval).apply(lambda x: (x[2]-x[0])*(x[3]-x[1]))
    average_bbox_dimensions = avg_bbox_dimensions.mean()
    std_avg_bbox_dimensions = avg_bbox_dimensions.std()
    
    # Calculate percentage of times there's a match
    if total_instances > 0:
        percentage_match = (total_matches / total_instances) * 100
    else:
        percentage_match = 0
 
    # Append the calculated statistics to the list
    entity_stats.append({
        'entity_type': entity_type,
        'Number of Occurrences': total_instances,
        'Percentage of Matches': percentage_match,
        'Average Overlapping Index': average_overlap_index,
        'Std Average Overlapping Index': std_average_overlap_index,
        'Average Overlapping Index (Matched)': average_overlap_index_matched,
        'Std Average Overlapping Index (Matched)': std_average_overlap_index_matched,
        'Average Overlapping Index (Unmatched)': average_overlap_index_unmatched,
        'Std Average Overlapping Index (Unmatched)': std_average_overlap_index_unmatched,
        'Avg BBox Dimensions (Correct)': avg_bbox_correct,
        'Std Avg BBox Dimensions (Correct)': std_avg_bbox_correct,
        'Avg BBox Dimensions (Incorrect)': avg_bbox_incorrect,
        'Std Avg BBox Dimensions (Incorrect)': std_avg_bbox_incorrect,
        'Average BBox Dimensions (All)': average_bbox_dimensions,
        'Std Average BBox Dimensions (All)': std_avg_bbox_dimensions
    })
 
# Create a DataFrame from the list of calculated statistics
stats_df = pd.DataFrame(entity_stats)
 
# Convert all numeric columns to float
stats_df = stats_df.apply(pd.to_numeric, errors='ignore')
# Sort the DataFrame by the 'Number of Occurrences' column in descending order
stats_df = stats_df.sort_values(by='Number of Occurrences', ascending=False)
 
# Export the DataFrame to a CSV file
stats_df.to_csv("entity_statistics_with_std_rounded.csv", index=False, float_format='%.4f')