Calcolo statistiche per valutazione

Questo documento spiega come sono state calcolate le statistiche riguardanti la valutazione zero shot del modello Kosmos2 su vari dataset.

Alla fine della fase di evaluation, ho ottenuto un file chiamato “zero_shot_final.csv”. Questo file è una tabella di questo tipo:

environment	entity_type	lexical_references	image_bbox	image_normal	bounding_box	kosmos_bounding_box	overlap_index	Match
2416	Painting	[‘quadro’]	Robocup/2416/images/LivingRoom/bounding_box/position_0/2416_LivingRoom_bounding_box_pos_0_180.jpg	Robocup/2416/images/LivingRoom/normal/position_0/2416_LivingRoom_pos_0_180.jpg	(0.24666666666666667, 0.2733333333333333, 0.8933333333333333, 0.5966666666666667)	(0.234375, 0.265625, 0.890625, 0.609375)	0.9194192970203593	True
2644	Painting	[‘quadro’]	Simpleset/2644/images/LivingRoom/bounding_box/position_5/2644_LivingRoom_bounding_box_pos_5_90.jpg	Simpleset/2644/images/LivingRoom/normal/position_5/2644_LivingRoom_pos_5_90.jpg	(0.8866666666666667, 0.42, 0.9983333333333333, 0.755)	(0.015625, 0.015625, 0.359375, 0.703125)	0.0	False
2746	Dining Table	[‘tavolo_da_pranzo’]	S4R/2746/images/LivingRoom/bounding_box/position_1/2746_LivingRoom_bounding_box_pos_1_270.jpg	S4R/2746/images/LivingRoom/normal/position_1/2746_LivingRoom_pos_1_270.jpg	(0.31333333333333335, 0.29, 0.385, 0.3616666666666667)	(0.015625, 0.015625, 0.328125, 0.859375)	0.003959207069253051	False
2684	Arm Chair	[‘poltrona’]	S4R/2684/images/LivingRoom/bounding_box/position_3/2684_LivingRoom_bounding_box_pos_3_0.jpg	S4R/2684/images/LivingRoom/normal/position_3/2684_LivingRoom_pos_3_0.jpg	(0.36333333333333334, 0.315, 0.4583333333333333, 0.40166666666666667)	(0.359375, 0.296875, 0.453125, 0.421875)	0.6394293865905849	True
2279	Painting	[‘quadro’]	Robocup/2279/images/LivingRoom/bounding_box/position_0/2279_LivingRoom_bounding_box_pos_0_0.jpg	Robocup/2279/images/LivingRoom/normal/position_0/2279_LivingRoom_pos_0_0.jpg	(0.5533333333333333, 0.07, 0.7283333333333334, 0.20333333333333334)	(0.546875, 0.078125, 0.734375, 0.203125)	0.8786610878661091	True
3353	Floor Lamp	[‘lampada_da_terra’]	Rockin2/3353/images/LivingRoom/bounding_box/position_0/3353_LivingRoom_bounding_box_pos_0_0.jpg	Rockin2/3353/images/LivingRoom/normal/position_0/3353_LivingRoom_pos_0_0.jpg	(0.41, 0.26166666666666666, 0.5066666666666667, 0.4716666666666667)	(0.265625, 0.421875, 0.609375, 0.890625)	0.02725175434888814	False
3385	Garbage Can	[‘pattumiera’]	Rockin2/3385/images/LivingRoom/bounding_box/position_3/3385_LivingRoom_bounding_box_pos_3_180.jpg	Rockin2/3385/images/LivingRoom/normal/position_3/3385_LivingRoom_pos_3_180.jpg	(0.5366666666666666, 0.605, 0.7216666666666667, 0.7733333333333333)	(0.515625, 0.609375, 0.734375, 0.796875)	0.725219167164774	True
3068	Chair	[‘sedia’]	Rockin1/3068/images/LivingRoom/bounding_box/position_2/3068_LivingRoom_bounding_box_pos_2_90.jpg	Rockin1/3068/images/LivingRoom/normal/position_2/3068_LivingRoom_pos_2_90.jpg	(0.47833333333333333, 0.33666666666666667, 0.5533333333333333, 0.56)	(0.390625, 0.390625, 0.984375, 0.703125)	0.06700181308719304	False
Dove sostanzialmente è contenuta: l’entità, l’immagine presa in considerazione, il boundig box target (preso come ground truth) , il bounding box generato dal modello, l’overlapping index, ed un valore booleano che indica se il modello è riuscito a trovare l’immagine o meno.

A questo punto ho computato una serie di statistiche per ogni tipo diverso di entità:

Il numero di occorrenze per ogni entità:

entity_type	Number of Occurrences
Total	10000
Painting	751
Cell Phone	628
Remote Control	518
Book	485
Chair	443
Pen	356
Dining Table	317
Box	214
Key Chain	207
Counter Top	204
Bowl	194
Bottle	192
House Plant	192
Television	185
Statue	169
Plate	165
Sofa	161
Laptop	161
Fridge	157
Knife	150
Bed	141
Dresser	141
Wine Bottle	138
Garbage Can	135
Fork	129
Spoon	128
Pillow	125
Mug	119
Arm Chair	114
Bread	114
Spray Bottle	113
Vase	110
Soap Bottle	106
Spatula	94
Pencil	91
Toaster	89
Shelving Unit	88
Toilet	86
Kettle	84
TV Stand	84
Butter Knife	83
Newspaper	76
Apple	75
Cup	73
Washing Machine	73
Side Table	72
Candle	70
Sink	64
Floor Lamp	64
Credit Card	60
Pepper Shaker	58
Potato	57
Salt Shaker	56
Tomato	56
Stool	55
Pan	54
Garbage Bag	54
Faucet	54
Dish Sponge	52
Lettuce	51
Microwave	46
Toilet Paper	46
Watch	43
Teddy Bear	43
Paper Towel Roll	38
Desk Lamp	37
Plunger	37
Basket Ball	35
Pot	35
Dog Bed	34
Ladle	34
Baseball Bat	33
Cart	32
Tissue Box	26
Egg	23
Alarm Clock	22
Desk	17
Coffee Machine	14
Soap Bar	13
Tennis Racket	11
Safe	11
Cloth	10
Laundry Hamper	9
Vacuum Cleaner	7
Boots	3
Desktop	2
Room Decor	2
Table Top Decor	1
Ottoman	1
Questo indica quante volte ognuna di queste entità è apparsa nei dati che abbiamo valutato

# Group the DataFrame by 'entity_type'
grouped = df.groupby('entity_type')
 
# Iterate over each entity type
for entity_type, group in grouped:
    # Calculate statistics for the current entity type
    total_matches = group['Match'].sum()

La percentuale di istanze corrette

Per ogni tipo di entità diverso, calcola la percentuale di istanze che il modello ha determinato correttamente

entity_type	Percentage of Matches
Total	22.2700
Painting	40.3462
Cell Phone	2.0701
Remote Control	1.7375
Book	8.8660
Chair	25.0564
Pen	0.0000
Dining Table	58.6751
Box	10.2804
Key Chain	0.4831
Counter Top	37.7451
Bowl	5.6701
Bottle	6.7708
House Plant	46.3542
Television	63.7838
Statue	20.7101
Plate	4.2424
Sofa	65.8385
Laptop	16.7702
Fridge	66.2420
Knife	2.6667
Bed	52.4823
Dresser	61.7021
Wine Bottle	13.7681
Garbage Can	66.6667
Fork	0.0000
Spoon	0.0000
Pillow	15.2000
Mug	4.2017
Arm Chair	54.3860
Bread	7.8947
Spray Bottle	15.0442
Vase	8.1818
Soap Bottle	6.6038
Spatula	0.0000
Pencil	1.0989
Toaster	12.3596
Shelving Unit	61.3636
Toilet	66.2791
Kettle	5.9524
TV Stand	35.7143
Butter Knife	0.0000
Newspaper	5.2632
Apple	2.6667
Cup	4.1096
Washing Machine	69.8630
Side Table	40.2778
Candle	1.4286
Sink	71.8750
Floor Lamp	59.3750
Credit Card	0.0000
Pepper Shaker	0.0000
Potato	5.2632
Salt Shaker	0.0000
Tomato	8.9286
Stool	56.3636
Pan	7.4074
Garbage Bag	68.5185
Faucet	11.1111
Dish Sponge	0.0000
Lettuce	11.7647
Microwave	13.0435
Toilet Paper	4.3478
Watch	0.0000
Teddy Bear	48.8372
Paper Towel Roll	7.8947
Desk Lamp	8.1081
Plunger	24.3243
Basket Ball	31.4286
Pot	8.5714
Dog Bed	44.1176
Ladle	2.9412
Baseball Bat	15.1515
Cart	46.8750
Tissue Box	3.8462
Egg	0.0000
Alarm Clock	18.1818
Desk	47.0588
Coffee Machine	35.7143
Soap Bar	0.0000
Tennis Racket	9.0909
Safe	36.3636
Cloth	0.0000
Laundry Hamper	44.4444
Vacuum Cleaner	57.1429
Boots	0.0000
Desktop	0.0000
Room Decor	0.0000
Table Top Decor	0.0000
Ottoman	100.0000

    std_total_matches = group['Match'].std()
    # Calculate percentage of times there's a match
    if total_instances > 0:
        percentage_match = (total_matches / total_instances) * 100
    else:
        percentage_match = 0

Overlapping index medio

entity_type	Average Overlapping Index	Average Overlapping Index (Matched)	Average Overlapping Index (Unmatched)
Total	0.1784	0.7687	0.0093
Painting	0.3207	0.7896	0.0037
Cell Phone	0.0169	0.6653	0.0032
Remote Control	0.0153	0.6365	0.0043
Book	0.0693	0.7046	0.0075
Chair	0.2052	0.7385	0.0269
Pen	0.0011		0.0011
Dining Table	0.4669	0.7749	0.0296
Box	0.0849	0.7023	0.0142
Key Chain	0.0043	0.6105	0.0013
Counter Top	0.3100	0.8014	0.0121
Bowl	0.0428	0.6360	0.0071
Bottle	0.0475	0.6471	0.0040
House Plant	0.3412	0.7155	0.0178
Television	0.4931	0.7677	0.0095
Statue	0.1470	0.6560	0.0140
Plate	0.0334	0.7049	0.0036
Sofa	0.5580	0.8274	0.0389
Laptop	0.1320	0.7259	0.0123
Fridge	0.5615	0.8449	0.0053
Knife	0.0227	0.7009	0.0042
Bed	0.4395	0.8191	0.0202
Dresser	0.5399	0.8548	0.0326
Wine Bottle	0.0986	0.6525	0.0102
Garbage Can	0.5233	0.7666	0.0367
Fork	0.0019		0.0019
Spoon	0.0014		0.0014
Pillow	0.1168	0.6873	0.0146
Mug	0.0310	0.6205	0.0052
Arm Chair	0.4404	0.7965	0.0159
Bread	0.0632	0.6982	0.0088
Spray Bottle	0.1057	0.6653	0.0066
Vase	0.0639	0.6518	0.0115
Soap Bottle	0.0466	0.6285	0.0054
Spatula	0.0058		0.0058
Pencil	0.0070	0.5701	0.0007
Toaster	0.0912	0.6816	0.0080
Shelving Unit	0.5259	0.8304	0.0421
Toilet	0.5122	0.7458	0.0529
Kettle	0.0475	0.6470	0.0096
TV Stand	0.2836	0.7525	0.0230
Butter Knife	0.0045		0.0045
Newspaper	0.0392	0.6769	0.0037
Apple	0.0223	0.6565	0.0050
Cup	0.0306	0.6311	0.0049
Washing Machine	0.5904	0.8256	0.0452
Side Table	0.3204	0.7835	0.0081
Candle	0.0216	0.8907	0.0090
Sink	0.5634	0.7741	0.0251
Floor Lamp	0.4851	0.7990	0.0264
Credit Card	0.0010		0.0010
Pepper Shaker	0.0020		0.0020
Potato	0.0349	0.6414	0.0012
Salt Shaker	0.0010		0.0010
Tomato	0.0581	0.6345	0.0016
Stool	0.4302	0.7595	0.0049
Pan	0.0531	0.6368	0.0064
Garbage Bag	0.5245	0.7509	0.0318
Faucet	0.1033	0.7218	0.0260
Dish Sponge	0.0045		0.0045
Lettuce	0.0811	0.6854	0.0006
Microwave	0.1258	0.7665	0.0297
Toilet Paper	0.0321	0.6579	0.0037
Watch	0.0009		0.0009
Teddy Bear	0.3605	0.6945	0.0417
Paper Towel Roll	0.0512	0.5987	0.0043
Desk Lamp	0.0821	0.7218	0.0257
Plunger	0.1827	0.6990	0.0168
Basket Ball	0.2573	0.6320	0.0856
Pot	0.0637	0.7074	0.0033
Dog Bed	0.3621	0.8205	0.0003
Ladle	0.0248	0.7196	0.0038
Baseball Bat	0.1196	0.7497	0.0071
Cart	0.3785	0.8061	0.0012
Tissue Box	0.0357	0.7297	0.0079
Egg	0.0015		0.0015
Alarm Clock	0.1260	0.6687	0.0055
Desk	0.4079	0.8446	0.0197
Coffee Machine	0.2354	0.6260	0.0183
Soap Bar	0.0010		0.0010
Tennis Racket	0.0718	0.5645	0.0226
Safe	0.2719	0.7300	0.0101
Cloth	0.0000		0.0000
Laundry Hamper	0.3525	0.7875	0.0044
Vacuum Cleaner	0.4380	0.7664	0.0000
Boots	0.0068		0.0068
Desktop	0.0408		0.0408
Room Decor	0.0123		0.0123
Table Top Decor	0.0558		0.0558
Ottoman	0.9066	0.9066

    average_overlap_index_all = df['overlap_index'].mean()
    std_average_overlap_index_matched = group[group['Match']]['overlap_index'].std()
    std_average_overlap_index_unmatched = group[~group['Match']]['overlap_index'].std()

Dimensione media dei bounding box

entity_type	Avg BBox Dimensions (Correct)	Avg BBox Dimensions (Incorrect)	Average BBox Dimensions (All)
Total	0.0763	0.0067	0.0222
Painting	0.0672	0.0141	0.0355
Cell Phone	0.0063	0.0010	0.0011
Remote Control	0.0098	0.0008	0.0010
Book	0.0195	0.0028	0.0043
Chair	0.0444	0.0113	0.0196
Pen		0.0004	0.0004
Dining Table	0.0732	0.0289	0.0549
Box	0.0164	0.0051	0.0062
Key Chain	0.0055	0.0003	0.0004
Counter Top	0.1864	0.0679	0.1126
Bowl	0.0098	0.0029	0.0033
Bottle	0.0105	0.0017	0.0023
House Plant	0.0320	0.0045	0.0172
Television	0.1015	0.0191	0.0716
Statue	0.0165	0.0032	0.0059
Plate	0.0153	0.0012	0.0018
Sofa	0.1050	0.0236	0.0772
Laptop	0.0271	0.0047	0.0084
Fridge	0.2056	0.0652	0.1582
Knife	0.0139	0.0015	0.0018
Bed	0.1410	0.0271	0.0869
Dresser	0.1222	0.0198	0.0830
Wine Bottle	0.0093	0.0054	0.0059
Garbage Can	0.0274	0.0102	0.0217
Fork		0.0007	0.0007
Spoon		0.0004	0.0004
Pillow	0.0200	0.0028	0.0054
Mug	0.0116	0.0015	0.0019
Arm Chair	0.0597	0.0206	0.0419
Bread	0.0155	0.0018	0.0029
Spray Bottle	0.0110	0.0012	0.0027
Vase	0.0206	0.0015	0.0031
Soap Bottle	0.0101	0.0025	0.0030
Spatula		0.0020	0.0020
Pencil	0.0028	0.0003	0.0003
Toaster	0.0181	0.0042	0.0060
Shelving Unit	0.1446	0.0368	0.1030
Toilet	0.0835	0.0132	0.0598
Kettle	0.0128	0.0039	0.0044
TV Stand	0.1123	0.0213	0.0538
Butter Knife		0.0002	0.0002
Newspaper	0.0225	0.0013	0.0024
Apple	0.0037	0.0019	0.0019
Cup	0.0072	0.0015	0.0017
Washing Machine	0.0982	0.0260	0.0764
Side Table	0.0557	0.0172	0.0327
Candle	0.0061	0.0007	0.0008
Sink	0.0772	0.0175	0.0604
Floor Lamp	0.1211	0.0273	0.0830
Credit Card		0.0003	0.0003
Pepper Shaker		0.0019	0.0019
Potato	0.0039	0.0020	0.0021
Salt Shaker		0.0005	0.0005
Tomato	0.0044	0.0005	0.0008
Stool	0.0257	0.0081	0.0181
Pan	0.0166	0.0039	0.0048
Garbage Bag	0.0218	0.0055	0.0167
Faucet	0.0227	0.0107	0.0120
Dish Sponge		0.0006	0.0006
Lettuce	0.0104	0.0007	0.0019
Microwave	0.0431	0.0089	0.0133
Toilet Paper	0.0064	0.0011	0.0013
Watch		0.0003	0.0003
Teddy Bear	0.0152	0.0027	0.0088
Paper Towel Roll	0.0072	0.0021	0.0025
Desk Lamp	0.0269	0.0037	0.0056
Plunger	0.0121	0.0042	0.0061
Basket Ball	0.0083	0.0018	0.0038
Pot	0.0248	0.0031	0.0049
Dog Bed	0.0434	0.0146	0.0273
Ladle	0.0095	0.0011	0.0013
Baseball Bat	0.0229	0.0036	0.0065
Cart	0.0781	0.0362	0.0559
Tissue Box	0.0087	0.0024	0.0026
Egg		0.0005	0.0005
Alarm Clock	0.0103	0.0034	0.0046
Desk	0.1419	0.0392	0.0876
Coffee Machine	0.0185	0.0132	0.0151
Soap Bar		0.0005	0.0005
Tennis Racket	0.0099	0.0047	0.0052
Safe	0.0163	0.0055	0.0094
Cloth		0.0026	0.0026
Laundry Hamper	0.0599	0.0154	0.0352
Vacuum Cleaner	0.0301	0.0183	0.0250
Boots		0.0011	0.0011
Desktop		0.0316	0.0316
Room Decor		0.0130	0.0130
Table Top Decor		0.0027	0.0027
Ottoman	0.1021		0.1021

    avg_bbox_correct = group[group['Match']]['bounding_box'].apply(eval).apply(lambda x: (x[2]-x[0])*(x[3]-x[1])).mean()
    avg_bbox_incorrect = group[~group['Match']]['bounding_box'].apply(eval).apply(lambda x: (x[2]-x[0])*(x[3]-x[1])).mean()
    avg_bbox_dimensions = group['bounding_box'].apply(eval).apply(lambda x: (x[2]-x[0])*(x[3]-x[1]))

Standard deviation

Inoltre ho calcolato la deviazione standard per ogni valore

    # Calculate standard deviations
    std_total_matches = group['Match'].std()
    std_average_overlap_index = group['overlap_index'].std()
    std_average_overlap_index_matched = group[group['Match']]['overlap_index'].std()
    std_average_overlap_index_unmatched = group[~group['Match']]['overlap_index'].std()
    std_avg_bbox_correct = group[group['Match']]['bounding_box'].apply(eval).apply(lambda x: (x[2]-x[0])*(x[3]-x[1])).std()
    std_avg_bbox_incorrect = group[~group['Match']]['bounding_box'].apply(eval).apply(lambda x: (x[2]-x[0])*(x[3]-x[1])).std()

Ulteriori operazioni

Altre operazioni su questo dataset sono:

Convertire in float i valori
Ordinare per numero di occorrenze
troncare i valori al quarto decimale

# Create a DataFrame from the list of calculated statistics
stats_df = pd.DataFrame(entity_stats)
 
# Convert all numeric columns to float
stats_df = stats_df.apply(pd.to_numeric, errors='ignore')
# Sort the DataFrame by the 'Number of Occurrences' column in descending order
stats_df = stats_df.sort_values(by='Number of Occurrences', ascending=False)
 
# Export the DataFrame to a CSV file
stats_df.to_csv("entity_statistics_with_std_rounded.csv", index=False, float_format='%.4f')

File completo

import pandas as pd
import numpy as np
 
# Read the CSV file into a DataFrame
df = pd.read_csv("zero_shot_final.csv")
 
# Initialize a list to store calculated statistics for each entity type
entity_stats = []
 
# Calculate statistics for the entire dataset
total_instances_all = len(df)
total_matches_all = df['Match'].sum()
average_overlap_index_all = df['overlap_index'].mean()
average_overlap_index_matched_all = df[df['Match']]['overlap_index'].mean()
average_overlap_index_unmatched_all = df[~df['Match']]['overlap_index'].mean()
avg_bbox_correct_all = df[df['Match']]['bounding_box'].apply(eval).apply(lambda x: (x[2]-x[0])*(x[3]-x[1])).mean()
avg_bbox_incorrect_all = df[~df['Match']]['bounding_box'].apply(eval).apply(lambda x: (x[2]-x[0])*(x[3]-x[1])).mean()
 
# Calculate standard deviations
std_total_matches_all = df['Match'].std()
std_average_overlap_index_all = df['overlap_index'].std()
std_average_overlap_index_matched_all = df[df['Match']]['overlap_index'].std()
std_average_overlap_index_unmatched_all = df[~df['Match']]['overlap_index'].std()
std_avg_bbox_correct_all = df[df['Match']]['bounding_box'].apply(eval).apply(lambda x: (x[2]-x[0])*(x[3]-x[1])).std()
std_avg_bbox_incorrect_all = df[~df['Match']]['bounding_box'].apply(eval).apply(lambda x: (x[2]-x[0])*(x[3]-x[1])).std()
 
# Calculate average bounding box dimensions for all instances
avg_bbox_dimensions_all = df['bounding_box'].apply(eval).apply(lambda x: (x[2]-x[0])*(x[3]-x[1]))
average_bbox_dimensions_all = avg_bbox_dimensions_all.mean()
std_avg_bbox_dimensions_all = avg_bbox_dimensions_all.std()
 
# Calculate percentage of times there's a match for the entire dataset
if total_instances_all > 0:
    percentage_match_all = (total_matches_all / total_instances_all) * 100
else:
    percentage_match_all = 0
 
# Append the calculated statistics for the entire dataset to the list
entity_stats.append({
    'entity_type': 'Total',
    'Number of Occurrences': total_instances_all,
    'Percentage of Matches': percentage_match_all,
    'Average Overlapping Index': average_overlap_index_all,
    'Std Average Overlapping Index': std_average_overlap_index_all,
    'Average Overlapping Index (Matched)': average_overlap_index_matched_all,
    'Std Average Overlapping Index (Matched)': std_average_overlap_index_matched_all,
    'Average Overlapping Index (Unmatched)': average_overlap_index_unmatched_all,
    'Std Average Overlapping Index (Unmatched)': std_average_overlap_index_unmatched_all,
    'Avg BBox Dimensions (Correct)': avg_bbox_correct_all,
    'Std Avg BBox Dimensions (Correct)': std_avg_bbox_correct_all,
    'Avg BBox Dimensions (Incorrect)': avg_bbox_incorrect_all,
    'Std Avg BBox Dimensions (Incorrect)': std_avg_bbox_incorrect_all,
    'Average BBox Dimensions (All)': average_bbox_dimensions_all,
    'Std Average BBox Dimensions (All)': std_avg_bbox_dimensions_all
})
 
# Group the DataFrame by 'entity_type'
grouped = df.groupby('entity_type')
 
# Iterate over each entity type
for entity_type, group in grouped:
    # Calculate statistics for the current entity type
    total_instances = len(group)
    total_matches = group['Match'].sum()
    average_overlap_index = group['overlap_index'].mean()
    average_overlap_index_matched = group[group['Match']]['overlap_index'].mean()
    average_overlap_index_unmatched = group[~group['Match']]['overlap_index'].mean()
    avg_bbox_correct = group[group['Match']]['bounding_box'].apply(eval).apply(lambda x: (x[2]-x[0])*(x[3]-x[1])).mean()
    avg_bbox_incorrect = group[~group['Match']]['bounding_box'].apply(eval).apply(lambda x: (x[2]-x[0])*(x[3]-x[1])).mean()
    
    # Calculate standard deviations
    std_total_matches = group['Match'].std()
    std_average_overlap_index = group['overlap_index'].std()
    std_average_overlap_index_matched = group[group['Match']]['overlap_index'].std()
    std_average_overlap_index_unmatched = group[~group['Match']]['overlap_index'].std()
    std_avg_bbox_correct = group[group['Match']]['bounding_box'].apply(eval).apply(lambda x: (x[2]-x[0])*(x[3]-x[1])).std()
    std_avg_bbox_incorrect = group[~group['Match']]['bounding_box'].apply(eval).apply(lambda x: (x[2]-x[0])*(x[3]-x[1])).std()
    
    # Calculate average bounding box dimensions for current entity type
    avg_bbox_dimensions = group['bounding_box'].apply(eval).apply(lambda x: (x[2]-x[0])*(x[3]-x[1]))
    average_bbox_dimensions = avg_bbox_dimensions.mean()
    std_avg_bbox_dimensions = avg_bbox_dimensions.std()
    
    # Calculate percentage of times there's a match
    if total_instances > 0:
        percentage_match = (total_matches / total_instances) * 100
    else:
        percentage_match = 0
 
    # Append the calculated statistics to the list
    entity_stats.append({
        'entity_type': entity_type,
        'Number of Occurrences': total_instances,
        'Percentage of Matches': percentage_match,
        'Average Overlapping Index': average_overlap_index,
        'Std Average Overlapping Index': std_average_overlap_index,
        'Average Overlapping Index (Matched)': average_overlap_index_matched,
        'Std Average Overlapping Index (Matched)': std_average_overlap_index_matched,
        'Average Overlapping Index (Unmatched)': average_overlap_index_unmatched,
        'Std Average Overlapping Index (Unmatched)': std_average_overlap_index_unmatched,
        'Avg BBox Dimensions (Correct)': avg_bbox_correct,
        'Std Avg BBox Dimensions (Correct)': std_avg_bbox_correct,
        'Avg BBox Dimensions (Incorrect)': avg_bbox_incorrect,
        'Std Avg BBox Dimensions (Incorrect)': std_avg_bbox_incorrect,
        'Average BBox Dimensions (All)': average_bbox_dimensions,
        'Std Average BBox Dimensions (All)': std_avg_bbox_dimensions
    })
 
# Create a DataFrame from the list of calculated statistics
stats_df = pd.DataFrame(entity_stats)
 
# Convert all numeric columns to float
stats_df = stats_df.apply(pd.to_numeric, errors='ignore')
# Sort the DataFrame by the 'Number of Occurrences' column in descending order
stats_df = stats_df.sort_values(by='Number of Occurrences', ascending=False)
 
# Export the DataFrame to a CSV file
stats_df.to_csv("entity_statistics_with_std_rounded.csv", index=False, float_format='%.4f')

🤖 Bruno Gatti

Explorer