Overview

Dataset statistics

Number of variables64
Number of observations81488
Missing cells2653069
Missing cells (%)50.9%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory227.2 MiB
Average record size in memory2.9 KiB

Variable types

Numeric1
Categorical63

Warnings

Line has a high cardinality: 61 distinct values High cardinality
Gene 1 has a high cardinality: 311 distinct values High cardinality
Gene 2 has a high cardinality: 291 distinct values High cardinality
Gene 3 has a high cardinality: 233 distinct values High cardinality
Gene 4 has a high cardinality: 186 distinct values High cardinality
Gene 5 has a high cardinality: 156 distinct values High cardinality
Gene 6 has a high cardinality: 125 distinct values High cardinality
Gene 7 has a high cardinality: 119 distinct values High cardinality
Gene 8 has a high cardinality: 115 distinct values High cardinality
Gene 9 has a high cardinality: 116 distinct values High cardinality
Gene 10 has a high cardinality: 103 distinct values High cardinality
Gene 11 has a high cardinality: 103 distinct values High cardinality
Gene 12 has a high cardinality: 105 distinct values High cardinality
Gene 13 has a high cardinality: 104 distinct values High cardinality
Gene 14 has a high cardinality: 118 distinct values High cardinality
Gene 15 has a high cardinality: 127 distinct values High cardinality
Gene 16 has a high cardinality: 131 distinct values High cardinality
Gene 17 has a high cardinality: 137 distinct values High cardinality
Gene 18 has a high cardinality: 145 distinct values High cardinality
Gene 19 has a high cardinality: 146 distinct values High cardinality
Gene 20 has a high cardinality: 147 distinct values High cardinality
Gene 21 has a high cardinality: 145 distinct values High cardinality
Gene 22 has a high cardinality: 135 distinct values High cardinality
Gene 23 has a high cardinality: 148 distinct values High cardinality
Gene 24 has a high cardinality: 170 distinct values High cardinality
Gene 25 has a high cardinality: 174 distinct values High cardinality
Gene 26 has a high cardinality: 178 distinct values High cardinality
Gene 27 has a high cardinality: 184 distinct values High cardinality
Gene 28 has a high cardinality: 188 distinct values High cardinality
Gene 29 has a high cardinality: 188 distinct values High cardinality
Gene 30 has a high cardinality: 185 distinct values High cardinality
Gene 31 has a high cardinality: 194 distinct values High cardinality
Gene 32 has a high cardinality: 204 distinct values High cardinality
Gene 33 has a high cardinality: 209 distinct values High cardinality
Gene 34 has a high cardinality: 212 distinct values High cardinality
Gene 35 has a high cardinality: 212 distinct values High cardinality
Gene 36 has a high cardinality: 197 distinct values High cardinality
Gene 37 has a high cardinality: 189 distinct values High cardinality
Gene 38 has a high cardinality: 183 distinct values High cardinality
Gene 39 has a high cardinality: 179 distinct values High cardinality
Gene 40 has a high cardinality: 178 distinct values High cardinality
Gene 41 has a high cardinality: 174 distinct values High cardinality
Gene 42 has a high cardinality: 171 distinct values High cardinality
Gene 43 has a high cardinality: 175 distinct values High cardinality
Gene 44 has a high cardinality: 178 distinct values High cardinality
Gene 45 has a high cardinality: 171 distinct values High cardinality
Gene 46 has a high cardinality: 161 distinct values High cardinality
Gene 47 has a high cardinality: 151 distinct values High cardinality
Gene 48 has a high cardinality: 144 distinct values High cardinality
Gene 49 has a high cardinality: 143 distinct values High cardinality
Gene 50 has a high cardinality: 138 distinct values High cardinality
Gene 51 has a high cardinality: 118 distinct values High cardinality
Gene 52 has a high cardinality: 116 distinct values High cardinality
Gene 53 has a high cardinality: 109 distinct values High cardinality
Gene 54 has a high cardinality: 94 distinct values High cardinality
Gene 55 has a high cardinality: 72 distinct values High cardinality
Gene 56 has a high cardinality: 51 distinct values High cardinality
Gene 58 is highly correlated with Gene 59 and 1 other fieldsHigh correlation
Gene 60 is highly correlated with Gene 59 and 1 other fieldsHigh correlation
Gene 59 is highly correlated with Gene 58 and 2 other fieldsHigh correlation
Gene 62 is highly correlated with Gene 58 and 8 other fieldsHigh correlation
Gene 55 is highly correlated with Gene 62 and 3 other fieldsHigh correlation
Gene 56 is highly correlated with Gene 62 and 2 other fieldsHigh correlation
Gene 61 is highly correlated with Gene 62High correlation
Gene 54 is highly correlated with Gene 62 and 1 other fieldsHigh correlation
Line is highly correlated with Gene 62High correlation
Gene 57 is highly correlated with Gene 62 and 2 other fieldsHigh correlation
Gene 4 has 1717 (2.1%) missing values Missing
Gene 5 has 2918 (3.6%) missing values Missing
Gene 6 has 4211 (5.2%) missing values Missing
Gene 7 has 5354 (6.6%) missing values Missing
Gene 8 has 6874 (8.4%) missing values Missing
Gene 9 has 8391 (10.3%) missing values Missing
Gene 10 has 10114 (12.4%) missing values Missing
Gene 11 has 11748 (14.4%) missing values Missing
Gene 12 has 13093 (16.1%) missing values Missing
Gene 13 has 14336 (17.6%) missing values Missing
Gene 14 has 15683 (19.2%) missing values Missing
Gene 15 has 17067 (20.9%) missing values Missing
Gene 16 has 18340 (22.5%) missing values Missing
Gene 17 has 19674 (24.1%) missing values Missing
Gene 18 has 21029 (25.8%) missing values Missing
Gene 19 has 22335 (27.4%) missing values Missing
Gene 20 has 23841 (29.3%) missing values Missing
Gene 21 has 25145 (30.9%) missing values Missing
Gene 22 has 26233 (32.2%) missing values Missing
Gene 23 has 27594 (33.9%) missing values Missing
Gene 24 has 28899 (35.5%) missing values Missing
Gene 25 has 30382 (37.3%) missing values Missing
Gene 26 has 31833 (39.1%) missing values Missing
Gene 27 has 33309 (40.9%) missing values Missing
Gene 28 has 34822 (42.7%) missing values Missing
Gene 29 has 36112 (44.3%) missing values Missing
Gene 30 has 37671 (46.2%) missing values Missing
Gene 31 has 39334 (48.3%) missing values Missing
Gene 32 has 40821 (50.1%) missing values Missing
Gene 33 has 42331 (51.9%) missing values Missing
Gene 34 has 44048 (54.1%) missing values Missing
Gene 35 has 45558 (55.9%) missing values Missing
Gene 36 has 47060 (57.8%) missing values Missing
Gene 37 has 48829 (59.9%) missing values Missing
Gene 38 has 50642 (62.1%) missing values Missing
Gene 39 has 52723 (64.7%) missing values Missing
Gene 40 has 54995 (67.5%) missing values Missing
Gene 41 has 57565 (70.6%) missing values Missing
Gene 42 has 60190 (73.9%) missing values Missing
Gene 43 has 62818 (77.1%) missing values Missing
Gene 44 has 65633 (80.5%) missing values Missing
Gene 45 has 68181 (83.7%) missing values Missing
Gene 46 has 70511 (86.5%) missing values Missing
Gene 47 has 72774 (89.3%) missing values Missing
Gene 48 has 74658 (91.6%) missing values Missing
Gene 49 has 76305 (93.6%) missing values Missing
Gene 50 has 77850 (95.5%) missing values Missing
Gene 51 has 79016 (97.0%) missing values Missing
Gene 52 has 79738 (97.9%) missing values Missing
Gene 53 has 80302 (98.5%) missing values Missing
Gene 54 has 80721 (99.1%) missing values Missing
Gene 55 has 81001 (99.4%) missing values Missing
Gene 56 has 81212 (99.7%) missing values Missing
Gene 57 has 81369 (99.9%) missing values Missing
Gene 58 has 81432 (99.9%) missing values Missing
Gene 59 has 81464 (> 99.9%) missing values Missing
Gene 60 has 81481 (> 99.9%) missing values Missing
Gene 61 has 81484 (> 99.9%) missing values Missing
Gene 62 has 81485 (> 99.9%) missing values Missing
Index is uniformly distributed Uniform
Gene 62 is uniformly distributed Uniform
Index has unique values Unique

Reproduction

Analysis started2021-01-14 18:16:54.838379
Analysis finished2021-01-14 18:17:44.228973
Duration49.39 seconds
Software versionpandas
Download configurationconfig.yaml

Variables

Index
Real number (ℝ≥0)

UNIFORM
UNIQUE

Distinct81488
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean40744.5
Minimum1
Maximum81488
Zeros0
Zeros (%)0.0%
Memory size636.8 KiB

Quantile statistics

Minimum1
5-th percentile4075.35
Q120372.75
median40744.5
Q361116.25
95-th percentile77413.65
Maximum81488
Range81487
Interquartile range (IQR)40743.5

Descriptive statistics

Standard deviation23523.7037
Coefficient of variation (CV)0.5773467267
Kurtosis-1.2
Mean40744.5
Median Absolute Deviation (MAD)20372
Skewness0
Sum3320187816
Variance553364636
MonotocityStrictly increasing
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20471
 
< 0.1%
764641
 
< 0.1%
252411
 
< 0.1%
313861
 
< 0.1%
293391
 
< 0.1%
191001
 
< 0.1%
170531
 
< 0.1%
231981
 
< 0.1%
211511
 
< 0.1%
744171
 
< 0.1%
Other values (81478)81478
> 99.9%
ValueCountFrequency (%)
11
< 0.1%
21
< 0.1%
31
< 0.1%
41
< 0.1%
51
< 0.1%
61
< 0.1%
71
< 0.1%
81
< 0.1%
91
< 0.1%
101
< 0.1%
ValueCountFrequency (%)
814881
< 0.1%
814871
< 0.1%
814861
< 0.1%
814851
< 0.1%
814841
< 0.1%
814831
< 0.1%
814821
< 0.1%
814811
< 0.1%
814801
< 0.1%
814791
< 0.1%

Line
Categorical

HIGH CARDINALITY
HIGH CORRELATION

Distinct61
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size4.7 MiB
L42
 
2815
L41
 
2628
L40
 
2625
L39
 
2570
L43
 
2548
Other values (56)
68302 

Length

Max length3
Median length3
Mean length2.855831533
Min length2

Characters and Unicode

Total characters232716
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowL1
2nd rowL1
3rd rowL1
4th rowL1
5th rowL1
ValueCountFrequency (%)
L422815
 
3.5%
L412628
 
3.2%
L402625
 
3.2%
L392570
 
3.2%
L432548
 
3.1%
L442330
 
2.9%
L382272
 
2.8%
L452263
 
2.8%
L372081
 
2.6%
L461884
 
2.3%
Other values (51)57472
70.5%
Histogram of lengths of the category
ValueCountFrequency (%)
l422815
 
3.5%
l412628
 
3.2%
l402625
 
3.2%
l392570
 
3.2%
l432548
 
3.1%
l442330
 
2.9%
l382272
 
2.8%
l452263
 
2.8%
l372081
 
2.6%
l461884
 
2.3%
Other values (51)57472
70.5%

Most occurring characters

ValueCountFrequency (%)
L81488
35.0%
429511
 
12.7%
326637
 
11.4%
222696
 
9.8%
121519
 
9.2%
510610
 
4.6%
88622
 
3.7%
98340
 
3.6%
68152
 
3.5%
77873
 
3.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number151228
65.0%
Uppercase Letter81488
35.0%

Most frequent character per category

ValueCountFrequency (%)
429511
19.5%
326637
17.6%
222696
15.0%
121519
14.2%
510610
 
7.0%
88622
 
5.7%
98340
 
5.5%
68152
 
5.4%
77873
 
5.2%
07268
 
4.8%
ValueCountFrequency (%)
L81488
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common151228
65.0%
Latin81488
35.0%

Most frequent character per script

ValueCountFrequency (%)
429511
19.5%
326637
17.6%
222696
15.0%
121519
14.2%
510610
 
7.0%
88622
 
5.7%
98340
 
5.5%
68152
 
5.4%
77873
 
5.2%
07268
 
4.8%
ValueCountFrequency (%)
L81488
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII232716
100.0%

Most frequent character per block

ValueCountFrequency (%)
L81488
35.0%
429511
 
12.7%
326637
 
11.4%
222696
 
9.8%
121519
 
9.2%
510610
 
4.6%
88622
 
3.7%
98340
 
3.6%
68152
 
3.5%
77873
 
3.4%

Gene 1
Categorical

HIGH CARDINALITY

Distinct311
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size4.8 MiB
STAT3
 
3419
POU5F1
 
1546
TP53
 
1371
SOX2
 
1166
FOXO3
 
1157
Other values (306)
72829 

Length

Max length12
Median length5
Mean length4.676799038
Min length2

Characters and Unicode

Total characters381103
Distinct characters36
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique41 ?
Unique (%)0.1%

Sample

1st rowDLX2
2nd rowGSX2
3rd rowESR2
4th rowAR
5th rowASCL1
ValueCountFrequency (%)
STAT33419
 
4.2%
POU5F11546
 
1.9%
TP531371
 
1.7%
SOX21166
 
1.4%
FOXO31157
 
1.4%
SMAD41156
 
1.4%
JUN1027
 
1.3%
CREB11018
 
1.2%
MYC1009
 
1.2%
SMAD3916
 
1.1%
Other values (301)67703
83.1%
Histogram of lengths of the category
ValueCountFrequency (%)
stat33419
 
4.2%
pou5f11546
 
1.9%
tp531371
 
1.7%
sox21166
 
1.4%
foxo31157
 
1.4%
smad41156
 
1.4%
jun1027
 
1.3%
creb11018
 
1.2%
myc1009
 
1.2%
smad3916
 
1.1%
Other values (301)67703
83.1%

Most occurring characters

ValueCountFrequency (%)
A34443
 
9.0%
128630
 
7.5%
T26255
 
6.9%
F24193
 
6.3%
S22972
 
6.0%
O18846
 
4.9%
X17687
 
4.6%
N17192
 
4.5%
R16977
 
4.5%
215804
 
4.1%
Other values (26)158104
41.5%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter300917
79.0%
Decimal Number79161
 
20.8%
Dash Punctuation1025
 
0.3%

Most frequent character per category

ValueCountFrequency (%)
A34443
 
11.4%
T26255
 
8.7%
F24193
 
8.0%
S22972
 
7.6%
O18846
 
6.3%
X17687
 
5.9%
N17192
 
5.7%
R16977
 
5.6%
P13326
 
4.4%
C13311
 
4.4%
Other values (15)95715
31.8%
ValueCountFrequency (%)
128630
36.2%
215804
20.0%
314242
18.0%
57449
 
9.4%
45798
 
7.3%
62206
 
2.8%
71306
 
1.6%
81288
 
1.6%
91276
 
1.6%
01162
 
1.5%
ValueCountFrequency (%)
-1025
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin300917
79.0%
Common80186
 
21.0%

Most frequent character per script

ValueCountFrequency (%)
A34443
 
11.4%
T26255
 
8.7%
F24193
 
8.0%
S22972
 
7.6%
O18846
 
6.3%
X17687
 
5.9%
N17192
 
5.7%
R16977
 
5.6%
P13326
 
4.4%
C13311
 
4.4%
Other values (15)95715
31.8%
ValueCountFrequency (%)
128630
35.7%
215804
19.7%
314242
17.8%
57449
 
9.3%
45798
 
7.2%
62206
 
2.8%
71306
 
1.6%
81288
 
1.6%
91276
 
1.6%
01162
 
1.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII381103
100.0%

Most frequent character per block

ValueCountFrequency (%)
A34443
 
9.0%
128630
 
7.5%
T26255
 
6.9%
F24193
 
6.3%
S22972
 
6.0%
O18846
 
4.9%
X17687
 
4.6%
N17192
 
4.5%
R16977
 
4.5%
215804
 
4.1%
Other values (26)158104
41.5%

Gene 2
Categorical

HIGH CARDINALITY

Distinct291
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size4.8 MiB
MYC
 
4231
TP53
 
2821
NANOG
 
2687
AR
 
2091
FOXP3
 
2050
Other values (286)
67608 

Length

Max length7
Median length5
Mean length4.452790595
Min length2

Characters and Unicode

Total characters362849
Distinct characters37
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique87 ?
Unique (%)0.1%

Sample

1st rowSNAI1
2nd rowSP8
3rd rowEGR1
4th rowTWIST1
5th rowINSM1
ValueCountFrequency (%)
MYC4231
 
5.2%
TP532821
 
3.5%
NANOG2687
 
3.3%
AR2091
 
2.6%
FOXP32050
 
2.5%
SNAI11845
 
2.3%
FOS1733
 
2.1%
PPARG1684
 
2.1%
SNAI21666
 
2.0%
EGR11421
 
1.7%
Other values (281)59259
72.7%
Histogram of lengths of the category
ValueCountFrequency (%)
myc4231
 
5.2%
tp532821
 
3.5%
nanog2687
 
3.3%
ar2091
 
2.6%
foxp32050
 
2.5%
snai11845
 
2.3%
fos1733
 
2.1%
pparg1684
 
2.1%
snai21666
 
2.0%
egr11421
 
1.7%
Other values (281)59259
72.7%

Most occurring characters

ValueCountFrequency (%)
A30520
 
8.4%
124649
 
6.8%
N22998
 
6.3%
F21934
 
6.0%
P20521
 
5.7%
R19647
 
5.4%
O19116
 
5.3%
S18706
 
5.2%
X17301
 
4.8%
T15981
 
4.4%
Other values (27)151476
41.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter289853
79.9%
Decimal Number70808
 
19.5%
Dash Punctuation2188
 
0.6%

Most frequent character per category

ValueCountFrequency (%)
A30520
 
10.5%
N22998
 
7.9%
F21934
 
7.6%
P20521
 
7.1%
R19647
 
6.8%
O19116
 
6.6%
S18706
 
6.5%
X17301
 
6.0%
T15981
 
5.5%
M13525
 
4.7%
Other values (16)89604
30.9%
ValueCountFrequency (%)
124649
34.8%
214018
19.8%
313403
18.9%
56921
 
9.8%
44800
 
6.8%
62413
 
3.4%
71559
 
2.2%
81279
 
1.8%
91125
 
1.6%
0641
 
0.9%
ValueCountFrequency (%)
-2188
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin289853
79.9%
Common72996
 
20.1%

Most frequent character per script

ValueCountFrequency (%)
A30520
 
10.5%
N22998
 
7.9%
F21934
 
7.6%
P20521
 
7.1%
R19647
 
6.8%
O19116
 
6.6%
S18706
 
6.5%
X17301
 
6.0%
T15981
 
5.5%
M13525
 
4.7%
Other values (16)89604
30.9%
ValueCountFrequency (%)
124649
33.8%
214018
19.2%
313403
18.4%
56921
 
9.5%
44800
 
6.6%
62413
 
3.3%
-2188
 
3.0%
71559
 
2.1%
81279
 
1.8%
91125
 
1.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII362849
100.0%

Most frequent character per block

ValueCountFrequency (%)
A30520
 
8.4%
124649
 
6.8%
N22998
 
6.3%
F21934
 
6.0%
P20521
 
5.7%
R19647
 
5.4%
O19116
 
5.3%
S18706
 
5.2%
X17301
 
4.8%
T15981
 
4.4%
Other values (27)151476
41.7%

Gene 3
Categorical

HIGH CARDINALITY

Distinct233
Distinct (%)0.3%
Missing813
Missing (%)1.0%
Memory size4.8 MiB
NANOG
12659 
SNAI1
8178 
TP53
 
4448
POU5F1
 
4385
PPARG
 
2670
Other values (228)
48335 

Length

Max length7
Median length5
Mean length4.775419895
Min length2

Characters and Unicode

Total characters385257
Distinct characters36
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique34 ?
Unique (%)< 0.1%

Sample

1st rowFOXA1
2nd rowSPDEF
3rd rowE2F7
4th rowZBTB7B
5th rowE2F1
ValueCountFrequency (%)
NANOG12659
 
15.5%
SNAI18178
 
10.0%
TP534448
 
5.5%
POU5F14385
 
5.4%
PPARG2670
 
3.3%
IRF82146
 
2.6%
SNAI22136
 
2.6%
SALL41973
 
2.4%
EGR11887
 
2.3%
KLF21770
 
2.2%
Other values (223)38423
47.2%
Histogram of lengths of the category
ValueCountFrequency (%)
nanog12659
 
15.7%
snai18178
 
10.1%
tp534448
 
5.5%
pou5f14385
 
5.4%
pparg2670
 
3.3%
irf82146
 
2.7%
snai22136
 
2.6%
sall41973
 
2.4%
egr11887
 
2.3%
klf21770
 
2.2%
Other values (223)38423
47.6%

Most occurring characters

ValueCountFrequency (%)
N45217
 
11.7%
A42137
 
10.9%
130287
 
7.9%
O27830
 
7.2%
S23133
 
6.0%
P22141
 
5.7%
G21088
 
5.5%
F18073
 
4.7%
I15753
 
4.1%
T15702
 
4.1%
Other values (26)123896
32.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter312247
81.0%
Decimal Number71997
 
18.7%
Dash Punctuation1013
 
0.3%

Most frequent character per category

ValueCountFrequency (%)
N45217
14.5%
A42137
13.5%
O27830
8.9%
S23133
 
7.4%
P22141
 
7.1%
G21088
 
6.8%
F18073
 
5.8%
I15753
 
5.0%
T15702
 
5.0%
R15700
 
5.0%
Other values (15)65473
21.0%
ValueCountFrequency (%)
130287
42.1%
211734
 
16.3%
311618
 
16.1%
59201
 
12.8%
43215
 
4.5%
82172
 
3.0%
61447
 
2.0%
91189
 
1.7%
7857
 
1.2%
0277
 
0.4%
ValueCountFrequency (%)
-1013
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin312247
81.0%
Common73010
 
19.0%

Most frequent character per script

ValueCountFrequency (%)
N45217
14.5%
A42137
13.5%
O27830
8.9%
S23133
 
7.4%
P22141
 
7.1%
G21088
 
6.8%
F18073
 
5.8%
I15753
 
5.0%
T15702
 
5.0%
R15700
 
5.0%
Other values (15)65473
21.0%
ValueCountFrequency (%)
130287
41.5%
211734
 
16.1%
311618
 
15.9%
59201
 
12.6%
43215
 
4.4%
82172
 
3.0%
61447
 
2.0%
91189
 
1.6%
-1013
 
1.4%
7857
 
1.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII385257
100.0%

Most frequent character per block

ValueCountFrequency (%)
N45217
 
11.7%
A42137
 
10.9%
130287
 
7.9%
O27830
 
7.2%
S23133
 
6.0%
P22141
 
5.7%
G21088
 
5.5%
F18073
 
4.7%
I15753
 
4.1%
T15702
 
4.1%
Other values (26)123896
32.2%

Gene 4
Categorical

HIGH CARDINALITY
MISSING

Distinct186
Distinct (%)0.2%
Missing1717
Missing (%)2.1%
Memory size4.8 MiB
NANOG
17907 
POU5F1
13939 
SALL4
4807 
SNAI1
4401 
TBX21
 
3389
Other values (181)
35328 

Length

Max length7
Median length5
Mean length5.014228228
Min length2

Characters and Unicode

Total characters399990
Distinct characters36
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique16 ?
Unique (%)< 0.1%

Sample

1st rowSALL1
2nd rowFOXE3
3rd rowTBR1
4th rowHMGA1
5th rowHIC1
ValueCountFrequency (%)
NANOG17907
22.0%
POU5F113939
17.1%
SALL44807
 
5.9%
SNAI14401
 
5.4%
TBX213389
 
4.2%
TP533052
 
3.7%
STAT12136
 
2.6%
FOXM11874
 
2.3%
NKX6-11568
 
1.9%
STAT31269
 
1.6%
Other values (176)25429
31.2%
(Missing)1717
 
2.1%
Histogram of lengths of the category
ValueCountFrequency (%)
nanog17907
22.4%
pou5f113939
17.5%
sall44807
 
6.0%
snai14401
 
5.5%
tbx213389
 
4.2%
tp533052
 
3.8%
stat12136
 
2.7%
foxm11874
 
2.3%
nkx6-11568
 
2.0%
stat31269
 
1.6%
Other values (176)25429
31.9%

Most occurring characters

ValueCountFrequency (%)
N48845
12.2%
O41371
 
10.3%
A38441
 
9.6%
135989
 
9.0%
F23193
 
5.8%
P22911
 
5.7%
G21388
 
5.3%
U17633
 
4.4%
517274
 
4.3%
S17205
 
4.3%
Other values (26)115740
28.9%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter317179
79.3%
Decimal Number81203
 
20.3%
Dash Punctuation1608
 
0.4%

Most frequent character per category

ValueCountFrequency (%)
N48845
15.4%
O41371
13.0%
A38441
12.1%
F23193
 
7.3%
P22911
 
7.2%
G21388
 
6.7%
U17633
 
5.6%
S17205
 
5.4%
T14553
 
4.6%
X13153
 
4.1%
Other values (15)58486
18.4%
ValueCountFrequency (%)
135989
44.3%
517274
21.3%
28952
 
11.0%
38306
 
10.2%
45166
 
6.4%
62545
 
3.1%
71169
 
1.4%
9851
 
1.0%
8795
 
1.0%
0156
 
0.2%
ValueCountFrequency (%)
-1608
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin317179
79.3%
Common82811
 
20.7%

Most frequent character per script

ValueCountFrequency (%)
N48845
15.4%
O41371
13.0%
A38441
12.1%
F23193
 
7.3%
P22911
 
7.2%
G21388
 
6.7%
U17633
 
5.6%
S17205
 
5.4%
T14553
 
4.6%
X13153
 
4.1%
Other values (15)58486
18.4%
ValueCountFrequency (%)
135989
43.5%
517274
20.9%
28952
 
10.8%
38306
 
10.0%
45166
 
6.2%
62545
 
3.1%
-1608
 
1.9%
71169
 
1.4%
9851
 
1.0%
8795
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII399990
100.0%

Most frequent character per block

ValueCountFrequency (%)
N48845
12.2%
O41371
 
10.3%
A38441
 
9.6%
135989
 
9.0%
F23193
 
5.8%
P22911
 
5.7%
G21388
 
5.3%
U17633
 
4.4%
517274
 
4.3%
S17205
 
4.3%
Other values (26)115740
28.9%

Gene 5
Categorical

HIGH CARDINALITY
MISSING

Distinct156
Distinct (%)0.2%
Missing2918
Missing (%)3.6%
Memory size4.7 MiB
POU5F1
16966 
SALL4
14052 
NANOG
11160 
RUNX3
3389 
TBX21
3197 
Other values (151)
29806 

Length

Max length7
Median length5
Mean length5.058546519
Min length2

Characters and Unicode

Total characters397450
Distinct characters36
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique18 ?
Unique (%)< 0.1%

Sample

1st rowE2F7
2nd rowREL
3rd rowSPDEF
4th rowTET2
5th rowZFP57
ValueCountFrequency (%)
POU5F116966
20.8%
SALL414052
17.2%
NANOG11160
13.7%
RUNX33389
 
4.2%
TBX213197
 
3.9%
MYC2591
 
3.2%
SNAI11986
 
2.4%
NEUROG31575
 
1.9%
STAT31571
 
1.9%
FOXM11301
 
1.6%
Other values (146)20782
25.5%
(Missing)2918
 
3.6%
Histogram of lengths of the category
ValueCountFrequency (%)
pou5f116966
21.6%
sall414052
17.9%
nanog11160
14.2%
runx33389
 
4.3%
tbx213197
 
4.1%
myc2591
 
3.3%
snai11986
 
2.5%
neurog31575
 
2.0%
stat31571
 
2.0%
foxm11301
 
1.7%
Other values (146)20782
26.5%

Most occurring characters

ValueCountFrequency (%)
O35546
 
8.9%
N34667
 
8.7%
A34183
 
8.6%
131888
 
8.0%
L29822
 
7.5%
U23953
 
6.0%
F23702
 
6.0%
S23665
 
6.0%
P21305
 
5.4%
518414
 
4.6%
Other values (26)120305
30.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter310655
78.2%
Decimal Number85532
 
21.5%
Dash Punctuation1263
 
0.3%

Most frequent character per category

ValueCountFrequency (%)
O35546
11.4%
N34667
11.2%
A34183
11.0%
L29822
9.6%
U23953
7.7%
F23702
7.6%
S23665
7.6%
P21305
 
6.9%
G14632
 
4.7%
X13790
 
4.4%
Other values (15)55390
17.8%
ValueCountFrequency (%)
131888
37.3%
518414
21.5%
414499
17.0%
28734
 
10.2%
38444
 
9.9%
91275
 
1.5%
61255
 
1.5%
7852
 
1.0%
8149
 
0.2%
022
 
< 0.1%
ValueCountFrequency (%)
-1263
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin310655
78.2%
Common86795
 
21.8%

Most frequent character per script

ValueCountFrequency (%)
O35546
11.4%
N34667
11.2%
A34183
11.0%
L29822
9.6%
U23953
7.7%
F23702
7.6%
S23665
7.6%
P21305
 
6.9%
G14632
 
4.7%
X13790
 
4.4%
Other values (15)55390
17.8%
ValueCountFrequency (%)
131888
36.7%
518414
21.2%
414499
16.7%
28734
 
10.1%
38444
 
9.7%
91275
 
1.5%
-1263
 
1.5%
61255
 
1.4%
7852
 
1.0%
8149
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII397450
100.0%

Most frequent character per block

ValueCountFrequency (%)
O35546
 
8.9%
N34667
 
8.7%
A34183
 
8.6%
131888
 
8.0%
L29822
 
7.5%
U23953
 
6.0%
F23702
 
6.0%
S23665
 
6.0%
P21305
 
5.4%
518414
 
4.6%
Other values (26)120305
30.3%

Gene 6
Categorical

HIGH CARDINALITY
MISSING

Distinct125
Distinct (%)0.2%
Missing4211
Missing (%)5.2%
Memory size4.7 MiB
SALL4
12447 
MYC
11592 
POU5F1
9366 
NANOG
7569 
STAT3
5248 
Other values (120)
31055 

Length

Max length7
Median length5
Mean length4.689959496
Min length2

Characters and Unicode

Total characters362426
Distinct characters36
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique12 ?
Unique (%)< 0.1%

Sample

1st rowZIC3
2nd rowFEV
3rd rowPLAG1
4th rowRORC
5th rowIRF2
ValueCountFrequency (%)
SALL412447
15.3%
MYC11592
14.2%
POU5F19366
11.5%
NANOG7569
9.3%
STAT35248
 
6.4%
SNAI14283
 
5.3%
EGR13362
 
4.1%
RUNX33197
 
3.9%
FOXM12088
 
2.6%
TBX211547
 
1.9%
Other values (115)16578
20.3%
(Missing)4211
 
5.2%
Histogram of lengths of the category
ValueCountFrequency (%)
sall412447
16.1%
myc11592
15.0%
pou5f19366
12.1%
nanog7569
9.8%
stat35248
 
6.8%
snai14283
 
5.5%
egr13362
 
4.4%
runx33197
 
4.1%
foxm12088
 
2.7%
tbx211547
 
2.0%
Other values (115)16578
21.5%

Most occurring characters

ValueCountFrequency (%)
A34325
 
9.5%
S28714
 
7.9%
N27597
 
7.6%
L26219
 
7.2%
125889
 
7.1%
O22052
 
6.1%
T18130
 
5.0%
U15982
 
4.4%
M14561
 
4.0%
F14553
 
4.0%
Other values (26)134404
37.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter294180
81.2%
Decimal Number67678
 
18.7%
Dash Punctuation568
 
0.2%

Most frequent character per category

ValueCountFrequency (%)
A34325
11.7%
S28714
 
9.8%
N27597
 
9.4%
L26219
 
8.9%
O22052
 
7.5%
T18130
 
6.2%
U15982
 
5.4%
M14561
 
4.9%
F14553
 
4.9%
G12216
 
4.2%
Other values (15)79831
27.1%
ValueCountFrequency (%)
125889
38.3%
412510
18.5%
310361
15.3%
510062
 
14.9%
26128
 
9.1%
91223
 
1.8%
7916
 
1.4%
6568
 
0.8%
011
 
< 0.1%
810
 
< 0.1%
ValueCountFrequency (%)
-568
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin294180
81.2%
Common68246
 
18.8%

Most frequent character per script

ValueCountFrequency (%)
A34325
11.7%
S28714
 
9.8%
N27597
 
9.4%
L26219
 
8.9%
O22052
 
7.5%
T18130
 
6.2%
U15982
 
5.4%
M14561
 
4.9%
F14553
 
4.9%
G12216
 
4.2%
Other values (15)79831
27.1%
ValueCountFrequency (%)
125889
37.9%
412510
18.3%
310361
15.2%
510062
 
14.7%
26128
 
9.0%
91223
 
1.8%
7916
 
1.3%
6568
 
0.8%
-568
 
0.8%
011
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII362426
100.0%

Most frequent character per block

ValueCountFrequency (%)
A34325
 
9.5%
S28714
 
7.9%
N27597
 
7.6%
L26219
 
7.2%
125889
 
7.1%
O22052
 
6.1%
T18130
 
5.0%
U15982
 
4.4%
M14561
 
4.0%
F14553
 
4.0%
Other values (26)134404
37.1%

Gene 7
Categorical

HIGH CARDINALITY
MISSING

Distinct119
Distinct (%)0.2%
Missing5354
Missing (%)6.6%
Memory size4.6 MiB
SNAI1
12273 
MYC
11617 
SALL4
8587 
POU5F1
5505 
NANOG
4939 
Other values (114)
33213 

Length

Max length7
Median length5
Mean length4.587319726
Min length2

Characters and Unicode

Total characters349251
Distinct characters36
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique12 ?
Unique (%)< 0.1%

Sample

1st rowZFP57
2nd rowE2F1
3rd rowPLAG1
4th rowPRDM16
5th rowHMGA1
ValueCountFrequency (%)
SNAI112273
15.1%
MYC11617
14.3%
SALL48587
10.5%
POU5F15505
 
6.8%
NANOG4939
 
6.1%
TWIST14123
 
5.1%
TP533326
 
4.1%
EGR13171
 
3.9%
ZEB13140
 
3.9%
FOXM12246
 
2.8%
Other values (109)17207
21.1%
(Missing)5354
 
6.6%
Histogram of lengths of the category
ValueCountFrequency (%)
snai112273
16.1%
myc11617
15.3%
sall48587
11.3%
pou5f15505
 
7.2%
nanog4939
 
6.5%
twist14123
 
5.4%
tp533326
 
4.4%
egr13171
 
4.2%
zeb13140
 
4.1%
foxm12246
 
3.0%
Other values (109)17207
22.6%

Most occurring characters

ValueCountFrequency (%)
134448
 
9.9%
A32645
 
9.3%
S32516
 
9.3%
N27387
 
7.8%
T21169
 
6.1%
L17982
 
5.1%
I17915
 
5.1%
O15600
 
4.5%
M14371
 
4.1%
C12240
 
3.5%
Other values (26)122978
35.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter280652
80.4%
Decimal Number68539
 
19.6%
Dash Punctuation60
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
A32645
11.6%
S32516
11.6%
N27387
 
9.8%
T21169
 
7.5%
L17982
 
6.4%
I17915
 
6.4%
O15600
 
5.6%
M14371
 
5.1%
C12240
 
4.4%
Y11849
 
4.2%
Other values (15)76978
27.4%
ValueCountFrequency (%)
134448
50.3%
58956
 
13.1%
48722
 
12.7%
37665
 
11.2%
25782
 
8.4%
92089
 
3.0%
7547
 
0.8%
6175
 
0.3%
8135
 
0.2%
020
 
< 0.1%
ValueCountFrequency (%)
-60
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin280652
80.4%
Common68599
 
19.6%

Most frequent character per script

ValueCountFrequency (%)
A32645
11.6%
S32516
11.6%
N27387
 
9.8%
T21169
 
7.5%
L17982
 
6.4%
I17915
 
6.4%
O15600
 
5.6%
M14371
 
5.1%
C12240
 
4.4%
Y11849
 
4.2%
Other values (15)76978
27.4%
ValueCountFrequency (%)
134448
50.2%
58956
 
13.1%
48722
 
12.7%
37665
 
11.2%
25782
 
8.4%
92089
 
3.0%
7547
 
0.8%
6175
 
0.3%
8135
 
0.2%
-60
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII349251
100.0%

Most frequent character per block

ValueCountFrequency (%)
134448
 
9.9%
A32645
 
9.3%
S32516
 
9.3%
N27387
 
7.8%
T21169
 
6.1%
L17982
 
5.1%
I17915
 
5.1%
O15600
 
4.5%
M14371
 
4.1%
C12240
 
3.5%
Other values (26)122978
35.2%

Gene 8
Categorical

HIGH CARDINALITY
MISSING

Distinct115
Distinct (%)0.2%
Missing6874
Missing (%)8.4%
Memory size4.6 MiB
SNAI1
10661 
ZEB1
8944 
MYC
8482 
NANOG
5399 
TP53
5107 
Other values (110)
36021 

Length

Max length7
Median length5
Mean length4.438067923
Min length2

Characters and Unicode

Total characters331142
Distinct characters36
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique13 ?
Unique (%)< 0.1%

Sample

1st rowTBX20
2nd rowSPDEF
3rd rowTFAP4
4th rowARID3A
5th rowTCF7L1
ValueCountFrequency (%)
SNAI110661
13.1%
ZEB18944
11.0%
MYC8482
10.4%
NANOG5399
 
6.6%
TP535107
 
6.3%
SALL44982
 
6.1%
FOXM14558
 
5.6%
POU5F14075
 
5.0%
AR3072
 
3.8%
KLF22888
 
3.5%
Other values (105)16446
20.2%
(Missing)6874
8.4%
Histogram of lengths of the category
ValueCountFrequency (%)
snai110661
14.3%
zeb18944
12.0%
myc8482
11.4%
nanog5399
 
7.2%
tp535107
 
6.8%
sall44982
 
6.7%
foxm14558
 
6.1%
pou5f14075
 
5.5%
ar3072
 
4.1%
klf22888
 
3.9%
Other values (105)16446
22.0%

Most occurring characters

ValueCountFrequency (%)
134666
 
10.5%
A28099
 
8.5%
N26937
 
8.1%
S23270
 
7.0%
O16664
 
5.0%
T14346
 
4.3%
I13890
 
4.2%
M13687
 
4.1%
L13312
 
4.0%
F12548
 
3.8%
Other values (26)133723
40.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter262774
79.4%
Decimal Number68316
 
20.6%
Dash Punctuation52
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
A28099
 
10.7%
N26937
 
10.3%
S23270
 
8.9%
O16664
 
6.3%
T14346
 
5.5%
I13890
 
5.3%
M13687
 
5.2%
L13312
 
5.1%
F12548
 
4.8%
X12232
 
4.7%
Other values (15)87789
33.4%
ValueCountFrequency (%)
134666
50.7%
59329
 
13.7%
38126
 
11.9%
27950
 
11.6%
45131
 
7.5%
92357
 
3.5%
7400
 
0.6%
6275
 
0.4%
045
 
0.1%
837
 
0.1%
ValueCountFrequency (%)
-52
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin262774
79.4%
Common68368
 
20.6%

Most frequent character per script

ValueCountFrequency (%)
A28099
 
10.7%
N26937
 
10.3%
S23270
 
8.9%
O16664
 
6.3%
T14346
 
5.5%
I13890
 
5.3%
M13687
 
5.2%
L13312
 
5.1%
F12548
 
4.8%
X12232
 
4.7%
Other values (15)87789
33.4%
ValueCountFrequency (%)
134666
50.7%
59329
 
13.6%
38126
 
11.9%
27950
 
11.6%
45131
 
7.5%
92357
 
3.4%
7400
 
0.6%
6275
 
0.4%
-52
 
0.1%
045
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII331142
100.0%

Most frequent character per block

ValueCountFrequency (%)
134666
 
10.5%
A28099
 
8.5%
N26937
 
8.1%
S23270
 
7.0%
O16664
 
5.0%
T14346
 
4.3%
I13890
 
4.2%
M13687
 
4.1%
L13312
 
4.0%
F12548
 
3.8%
Other values (26)133723
40.4%

Gene 9
Categorical

HIGH CARDINALITY
MISSING

Distinct116
Distinct (%)0.2%
Missing8391
Missing (%)10.3%
Memory size4.5 MiB
AR
8728 
ZEB1
7407 
SNAI1
5424 
POU5F1
5088 
MYC
4982 
Other values (111)
41468 

Length

Max length7
Median length4
Mean length4.308767802
Min length2

Characters and Unicode

Total characters314958
Distinct characters36
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique19 ?
Unique (%)< 0.1%

Sample

1st rowTFAP4
2nd rowTFAP4
3rd rowSPDEF
4th rowSP110
5th rowFOXF1
ValueCountFrequency (%)
AR8728
10.7%
ZEB17407
 
9.1%
SNAI15424
 
6.7%
POU5F15088
 
6.2%
MYC4982
 
6.1%
SOX94516
 
5.5%
KLF24437
 
5.4%
TP534228
 
5.2%
TBX213980
 
4.9%
SALL43917
 
4.8%
Other values (106)20390
25.0%
(Missing)8391
10.3%
Histogram of lengths of the category
ValueCountFrequency (%)
ar8728
11.9%
zeb17407
 
10.1%
snai15424
 
7.4%
pou5f15088
 
7.0%
myc4982
 
6.8%
sox94516
 
6.2%
klf24437
 
6.1%
tp534228
 
5.8%
tbx213980
 
5.4%
sall43917
 
5.4%
Other values (106)20390
27.9%

Most occurring characters

ValueCountFrequency (%)
131306
 
9.9%
A25942
 
8.2%
S21268
 
6.8%
T19250
 
6.1%
N18020
 
5.7%
O15082
 
4.8%
R14614
 
4.6%
X13805
 
4.4%
213330
 
4.2%
L12958
 
4.1%
Other values (26)129383
41.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter245028
77.8%
Decimal Number69787
 
22.2%
Dash Punctuation143
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
A25942
 
10.6%
S21268
 
8.7%
T19250
 
7.9%
N18020
 
7.4%
O15082
 
6.2%
R14614
 
6.0%
X13805
 
5.6%
L12958
 
5.3%
F12673
 
5.2%
I11757
 
4.8%
Other values (15)79659
32.5%
ValueCountFrequency (%)
131306
44.9%
213330
19.1%
59560
 
13.7%
35802
 
8.3%
94714
 
6.8%
44177
 
6.0%
6414
 
0.6%
7412
 
0.6%
848
 
0.1%
024
 
< 0.1%
ValueCountFrequency (%)
-143
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin245028
77.8%
Common69930
 
22.2%

Most frequent character per script

ValueCountFrequency (%)
A25942
 
10.6%
S21268
 
8.7%
T19250
 
7.9%
N18020
 
7.4%
O15082
 
6.2%
R14614
 
6.0%
X13805
 
5.6%
L12958
 
5.3%
F12673
 
5.2%
I11757
 
4.8%
Other values (15)79659
32.5%
ValueCountFrequency (%)
131306
44.8%
213330
19.1%
59560
 
13.7%
35802
 
8.3%
94714
 
6.7%
44177
 
6.0%
6414
 
0.6%
7412
 
0.6%
-143
 
0.2%
848
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII314958
100.0%

Most frequent character per block

ValueCountFrequency (%)
131306
 
9.9%
A25942
 
8.2%
S21268
 
6.8%
T19250
 
6.1%
N18020
 
5.7%
O15082
 
4.8%
R14614
 
4.6%
X13805
 
4.4%
213330
 
4.2%
L12958
 
4.1%
Other values (26)129383
41.1%

Gene 10
Categorical

HIGH CARDINALITY
MISSING

Distinct103
Distinct (%)0.1%
Missing10114
Missing (%)12.4%
Memory size4.5 MiB
TWIST1
7893 
AR
7228 
TBX21
5422 
SALL4
5137 
SNAI1
4872 
Other values (98)
40822 

Length

Max length7
Median length5
Mean length4.561703141
Min length2

Characters and Unicode

Total characters325587
Distinct characters36
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique14 ?
Unique (%)< 0.1%

Sample

1st rowIRF6
2nd rowIRF5
3rd rowE2F1
4th rowIRF6
5th rowIRF6
ValueCountFrequency (%)
TWIST17893
 
9.7%
AR7228
 
8.9%
TBX215422
 
6.7%
SALL45137
 
6.3%
SNAI14872
 
6.0%
RUNX24128
 
5.1%
RUNX33965
 
4.9%
MYC3916
 
4.8%
ZEB13848
 
4.7%
KLF23659
 
4.5%
Other values (93)21306
26.1%
(Missing)10114
12.4%
Histogram of lengths of the category
ValueCountFrequency (%)
twist17893
 
11.1%
ar7228
 
10.1%
tbx215422
 
7.6%
sall45137
 
7.2%
snai14872
 
6.8%
runx24128
 
5.8%
runx33965
 
5.6%
myc3916
 
5.5%
zeb13848
 
5.4%
klf23659
 
5.1%
Other values (93)21306
29.9%

Most occurring characters

ValueCountFrequency (%)
133933
 
10.4%
T29356
 
9.0%
A24830
 
7.6%
S23759
 
7.3%
X19196
 
5.9%
N19076
 
5.9%
R17530
 
5.4%
215863
 
4.9%
I15547
 
4.8%
L14414
 
4.4%
Other values (26)112083
34.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter256047
78.6%
Decimal Number69475
 
21.3%
Dash Punctuation65
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
T29356
11.5%
A24830
 
9.7%
S23759
 
9.3%
X19196
 
7.5%
N19076
 
7.5%
R17530
 
6.8%
I15547
 
6.1%
L14414
 
5.6%
F12327
 
4.8%
U12122
 
4.7%
Other values (15)67890
26.5%
ValueCountFrequency (%)
133933
48.8%
215863
22.8%
36458
 
9.3%
55539
 
8.0%
45484
 
7.9%
91682
 
2.4%
6265
 
0.4%
7169
 
0.2%
043
 
0.1%
839
 
0.1%
ValueCountFrequency (%)
-65
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin256047
78.6%
Common69540
 
21.4%

Most frequent character per script

ValueCountFrequency (%)
T29356
11.5%
A24830
 
9.7%
S23759
 
9.3%
X19196
 
7.5%
N19076
 
7.5%
R17530
 
6.8%
I15547
 
6.1%
L14414
 
5.6%
F12327
 
4.8%
U12122
 
4.7%
Other values (15)67890
26.5%
ValueCountFrequency (%)
133933
48.8%
215863
22.8%
36458
 
9.3%
55539
 
8.0%
45484
 
7.9%
91682
 
2.4%
6265
 
0.4%
7169
 
0.2%
-65
 
0.1%
043
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII325587
100.0%

Most frequent character per block

ValueCountFrequency (%)
133933
 
10.4%
T29356
 
9.0%
A24830
 
7.6%
S23759
 
7.3%
X19196
 
5.9%
N19076
 
5.9%
R17530
 
5.4%
215863
 
4.9%
I15547
 
4.8%
L14414
 
4.4%
Other values (26)112083
34.4%

Gene 11
Categorical

HIGH CARDINALITY
MISSING

Distinct103
Distinct (%)0.1%
Missing11748
Missing (%)14.4%
Memory size4.5 MiB
FOXM1
7562 
TWIST1
6446 
RUNX3
5410 
TBX21
5375 
MYC
5244 
Other values (98)
39703 

Length

Max length7
Median length5
Mean length4.592213937
Min length2

Characters and Unicode

Total characters320261
Distinct characters36
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8 ?
Unique (%)< 0.1%

Sample

1st rowTCF7
2nd rowPAX4
3rd rowIRF6
4th rowZBTB49
5th rowIRF2
ValueCountFrequency (%)
FOXM17562
 
9.3%
TWIST16446
 
7.9%
RUNX35410
 
6.6%
TBX215375
 
6.6%
MYC5244
 
6.4%
EGR14635
 
5.7%
SNAI24086
 
5.0%
AR3761
 
4.6%
SALL43415
 
4.2%
SOX93055
 
3.7%
Other values (93)20751
25.5%
(Missing)11748
14.4%
Histogram of lengths of the category
ValueCountFrequency (%)
foxm17562
 
10.8%
twist16446
 
9.2%
runx35410
 
7.8%
tbx215375
 
7.7%
myc5244
 
7.5%
egr14635
 
6.6%
snai24086
 
5.9%
ar3761
 
5.4%
sall43415
 
4.9%
sox93055
 
4.4%
Other values (93)20751
29.8%

Most occurring characters

ValueCountFrequency (%)
136118
 
11.3%
T27165
 
8.5%
X24048
 
7.5%
S21968
 
6.9%
A19565
 
6.1%
N18154
 
5.7%
R16000
 
5.0%
O15879
 
5.0%
I13600
 
4.2%
M12928
 
4.0%
Other values (26)114836
35.9%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter252136
78.7%
Decimal Number67952
 
21.2%
Dash Punctuation173
 
0.1%

Most frequent character per category

ValueCountFrequency (%)
T27165
 
10.8%
X24048
 
9.5%
S21968
 
8.7%
A19565
 
7.8%
N18154
 
7.2%
R16000
 
6.3%
O15879
 
6.3%
I13600
 
5.4%
M12928
 
5.1%
F12655
 
5.0%
Other values (15)70174
27.8%
ValueCountFrequency (%)
136118
53.2%
212675
 
18.7%
37985
 
11.8%
53932
 
5.8%
43543
 
5.2%
93130
 
4.6%
7244
 
0.4%
6217
 
0.3%
091
 
0.1%
817
 
< 0.1%
ValueCountFrequency (%)
-173
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin252136
78.7%
Common68125
 
21.3%

Most frequent character per script

ValueCountFrequency (%)
T27165
 
10.8%
X24048
 
9.5%
S21968
 
8.7%
A19565
 
7.8%
N18154
 
7.2%
R16000
 
6.3%
O15879
 
6.3%
I13600
 
5.4%
M12928
 
5.1%
F12655
 
5.0%
Other values (15)70174
27.8%
ValueCountFrequency (%)
136118
53.0%
212675
 
18.6%
37985
 
11.7%
53932
 
5.8%
43543
 
5.2%
93130
 
4.6%
7244
 
0.4%
6217
 
0.3%
-173
 
0.3%
091
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII320261
100.0%

Most frequent character per block

ValueCountFrequency (%)
136118
 
11.3%
T27165
 
8.5%
X24048
 
7.5%
S21968
 
6.9%
A19565
 
6.1%
N18154
 
5.7%
R16000
 
5.0%
O15879
 
5.0%
I13600
 
4.2%
M12928
 
4.0%
Other values (26)114836
35.9%

Gene 12
Categorical

HIGH CARDINALITY
MISSING

Distinct105
Distinct (%)0.2%
Missing13093
Missing (%)16.1%
Memory size4.4 MiB
SOX9
7692 
FOXM1
6333 
RUNX3
5367 
EGR1
5318 
STAT1
4086 
Other values (100)
39599 

Length

Max length7
Median length5
Mean length4.564441845
Min length2

Characters and Unicode

Total characters312185
Distinct characters35
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique11 ?
Unique (%)< 0.1%

Sample

1st rowHIC1
2nd rowSOX14
3rd rowARID5B
4th rowTCF7L1
5th rowBACH2
ValueCountFrequency (%)
SOX97692
 
9.4%
FOXM16333
 
7.8%
RUNX35367
 
6.6%
EGR15318
 
6.5%
STAT14086
 
5.0%
TP533840
 
4.7%
TWIST13586
 
4.4%
MYC3415
 
4.2%
TBX213352
 
4.1%
CEBPA2930
 
3.6%
Other values (95)22476
27.6%
(Missing)13093
16.1%
Histogram of lengths of the category
ValueCountFrequency (%)
sox97692
 
11.2%
foxm16333
 
9.3%
runx35367
 
7.8%
egr15318
 
7.8%
stat14086
 
6.0%
tp533840
 
5.6%
twist13586
 
5.2%
myc3415
 
5.0%
tbx213352
 
4.9%
cebpa2930
 
4.3%
Other values (95)22476
32.9%

Most occurring characters

ValueCountFrequency (%)
131209
 
10.0%
X27009
 
8.7%
T24607
 
7.9%
S21659
 
6.9%
O19155
 
6.1%
A17624
 
5.6%
R17531
 
5.6%
N14988
 
4.8%
F12091
 
3.9%
E11586
 
3.7%
Other values (25)114726
36.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter244478
78.3%
Decimal Number67537
 
21.6%
Dash Punctuation170
 
0.1%

Most frequent character per category

ValueCountFrequency (%)
X27009
 
11.0%
T24607
 
10.1%
S21659
 
8.9%
O19155
 
7.8%
A17624
 
7.2%
R17531
 
7.2%
N14988
 
6.1%
F12091
 
4.9%
E11586
 
4.7%
P10675
 
4.4%
Other values (14)67553
27.6%
ValueCountFrequency (%)
131209
46.2%
310741
 
15.9%
29395
 
13.9%
97772
 
11.5%
56008
 
8.9%
41909
 
2.8%
6192
 
0.3%
8120
 
0.2%
7109
 
0.2%
082
 
0.1%
ValueCountFrequency (%)
-170
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin244478
78.3%
Common67707
 
21.7%

Most frequent character per script

ValueCountFrequency (%)
X27009
 
11.0%
T24607
 
10.1%
S21659
 
8.9%
O19155
 
7.8%
A17624
 
7.2%
R17531
 
7.2%
N14988
 
6.1%
F12091
 
4.9%
E11586
 
4.7%
P10675
 
4.4%
Other values (14)67553
27.6%
ValueCountFrequency (%)
131209
46.1%
310741
 
15.9%
29395
 
13.9%
97772
 
11.5%
56008
 
8.9%
41909
 
2.8%
6192
 
0.3%
-170
 
0.3%
8120
 
0.2%
7109
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII312185
100.0%

Most frequent character per block

ValueCountFrequency (%)
131209
 
10.0%
X27009
 
8.7%
T24607
 
7.9%
S21659
 
6.9%
O19155
 
6.1%
A17624
 
5.6%
R17531
 
5.6%
N14988
 
4.8%
F12091
 
3.9%
E11586
 
3.7%
Other values (25)114726
36.7%

Gene 13
Categorical

HIGH CARDINALITY
MISSING

Distinct104
Distinct (%)0.2%
Missing14336
Missing (%)17.6%
Memory size4.4 MiB
RUNX2
6790 
SOX9
6359 
TBX21
5181 
EGR1
4767 
CEBPA
4010 
Other values (99)
40045 

Length

Max length7
Median length5
Mean length4.709479986
Min length2

Characters and Unicode

Total characters316251
Distinct characters35
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique12 ?
Unique (%)< 0.1%

Sample

1st rowNKX6-1
2nd rowFOXA1
3rd rowARID3A
4th rowIRF2
5th rowFEV
ValueCountFrequency (%)
RUNX26790
 
8.3%
SOX96359
 
7.8%
TBX215181
 
6.4%
EGR14767
 
5.8%
CEBPA4010
 
4.9%
FOXM13702
 
4.5%
RUNX33346
 
4.1%
TP533032
 
3.7%
PPARG3030
 
3.7%
TWIST12864
 
3.5%
Other values (94)24071
29.5%
(Missing)14336
17.6%
Histogram of lengths of the category
ValueCountFrequency (%)
runx26790
 
10.1%
sox96359
 
9.5%
tbx215181
 
7.7%
egr14767
 
7.1%
cebpa4010
 
6.0%
foxm13702
 
5.5%
runx33346
 
5.0%
tp533032
 
4.5%
pparg3030
 
4.5%
twist12864
 
4.3%
Other values (94)24071
35.8%

Most occurring characters

ValueCountFrequency (%)
X27390
 
8.7%
124661
 
7.8%
R23246
 
7.4%
N19596
 
6.2%
A18379
 
5.8%
S17648
 
5.6%
T17613
 
5.6%
217198
 
5.4%
O16017
 
5.1%
P15569
 
4.9%
Other values (25)118934
37.6%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter250697
79.3%
Decimal Number65156
 
20.6%
Dash Punctuation398
 
0.1%

Most frequent character per category

ValueCountFrequency (%)
X27390
 
10.9%
R23246
 
9.3%
N19596
 
7.8%
A18379
 
7.3%
S17648
 
7.0%
T17613
 
7.0%
O16017
 
6.4%
P15569
 
6.2%
U13302
 
5.3%
E13157
 
5.2%
Other values (14)68780
27.4%
ValueCountFrequency (%)
124661
37.8%
217198
26.4%
39023
 
13.8%
96517
 
10.0%
54100
 
6.3%
42109
 
3.2%
81052
 
1.6%
6304
 
0.5%
7115
 
0.2%
077
 
0.1%
ValueCountFrequency (%)
-398
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin250697
79.3%
Common65554
 
20.7%

Most frequent character per script

ValueCountFrequency (%)
X27390
 
10.9%
R23246
 
9.3%
N19596
 
7.8%
A18379
 
7.3%
S17648
 
7.0%
T17613
 
7.0%
O16017
 
6.4%
P15569
 
6.2%
U13302
 
5.3%
E13157
 
5.2%
Other values (14)68780
27.4%
ValueCountFrequency (%)
124661
37.6%
217198
26.2%
39023
 
13.8%
96517
 
9.9%
54100
 
6.3%
42109
 
3.2%
81052
 
1.6%
-398
 
0.6%
6304
 
0.5%
7115
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII316251
100.0%

Most frequent character per block

ValueCountFrequency (%)
X27390
 
8.7%
124661
 
7.8%
R23246
 
7.4%
N19596
 
6.2%
A18379
 
5.8%
S17648
 
5.6%
T17613
 
5.6%
217198
 
5.4%
O16017
 
5.1%
P15569
 
4.9%
Other values (25)118934
37.6%

Gene 14
Categorical

HIGH CARDINALITY
MISSING

Distinct118
Distinct (%)0.2%
Missing15683
Missing (%)19.2%
Memory size4.3 MiB
SNAI2
6719 
RUNX2
5373 
RUNX3
5169 
SOX9
4580 
PPARG
4008 
Other values (113)
39956 

Length

Max length7
Median length5
Mean length4.671028037
Min length2

Characters and Unicode

Total characters307377
Distinct characters36
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10 ?
Unique (%)< 0.1%

Sample

1st rowPAX4
2nd rowPLAGL1
3rd rowARID3A
4th rowFEV
5th rowTBR1
ValueCountFrequency (%)
SNAI26719
 
8.2%
RUNX25373
 
6.6%
RUNX35169
 
6.3%
SOX94580
 
5.6%
PPARG4008
 
4.9%
EGR13109
 
3.8%
FOXM13007
 
3.7%
CEBPA2832
 
3.5%
FOS2582
 
3.2%
STAT12574
 
3.2%
Other values (108)25852
31.7%
(Missing)15683
19.2%
Histogram of lengths of the category
ValueCountFrequency (%)
snai26719
 
10.2%
runx25373
 
8.2%
runx35169
 
7.9%
sox94580
 
7.0%
pparg4008
 
6.1%
egr13109
 
4.7%
foxm13007
 
4.6%
cebpa2832
 
4.3%
fos2582
 
3.9%
stat12574
 
3.9%
Other values (108)25852
39.3%

Most occurring characters

ValueCountFrequency (%)
N25667
 
8.4%
R23474
 
7.6%
A22922
 
7.5%
S22869
 
7.4%
X21345
 
6.9%
121050
 
6.8%
216538
 
5.4%
O15407
 
5.0%
P14424
 
4.7%
U13852
 
4.5%
Other values (26)109829
35.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter249925
81.3%
Decimal Number57073
 
18.6%
Dash Punctuation379
 
0.1%

Most frequent character per category

ValueCountFrequency (%)
N25667
10.3%
R23474
 
9.4%
A22922
 
9.2%
S22869
 
9.2%
X21345
 
8.5%
O15407
 
6.2%
P14424
 
5.8%
U13852
 
5.5%
I13542
 
5.4%
T13258
 
5.3%
Other values (15)63165
25.3%
ValueCountFrequency (%)
121050
36.9%
216538
29.0%
38529
14.9%
94714
 
8.3%
53123
 
5.5%
81325
 
2.3%
41145
 
2.0%
6364
 
0.6%
7245
 
0.4%
040
 
0.1%
ValueCountFrequency (%)
-379
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin249925
81.3%
Common57452
 
18.7%

Most frequent character per script

ValueCountFrequency (%)
N25667
10.3%
R23474
 
9.4%
A22922
 
9.2%
S22869
 
9.2%
X21345
 
8.5%
O15407
 
6.2%
P14424
 
5.8%
U13852
 
5.5%
I13542
 
5.4%
T13258
 
5.3%
Other values (15)63165
25.3%
ValueCountFrequency (%)
121050
36.6%
216538
28.8%
38529
14.8%
94714
 
8.2%
53123
 
5.4%
81325
 
2.3%
41145
 
2.0%
-379
 
0.7%
6364
 
0.6%
7245
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII307377
100.0%

Most frequent character per block

ValueCountFrequency (%)
N25667
 
8.4%
R23474
 
7.6%
A22922
 
7.5%
S22869
 
7.4%
X21345
 
6.9%
121050
 
6.8%
216538
 
5.4%
O15407
 
5.0%
P14424
 
4.7%
U13852
 
4.5%
Other values (26)109829
35.7%

Gene 15
Categorical

HIGH CARDINALITY
MISSING

Distinct127
Distinct (%)0.2%
Missing17067
Missing (%)20.9%
Memory size4.3 MiB
STAT1
6719 
SNAI2
5314 
EGR1
5013 
SOX9
4260 
PPARG
 
3828
Other values (122)
39287 

Length

Max length7
Median length5
Mean length4.776703249
Min length2

Characters and Unicode

Total characters307720
Distinct characters36
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6 ?
Unique (%)< 0.1%

Sample

1st rowIRF6
2nd rowPAX4
3rd rowFEV
4th rowPAX4
5th rowFEV
ValueCountFrequency (%)
STAT16719
 
8.2%
SNAI25314
 
6.5%
EGR15013
 
6.2%
SOX94260
 
5.2%
PPARG3828
 
4.7%
RUNX23047
 
3.7%
TBX212632
 
3.2%
TP532590
 
3.2%
PRDM12556
 
3.1%
RUNX32080
 
2.6%
Other values (117)26382
32.4%
(Missing)17067
20.9%
Histogram of lengths of the category
ValueCountFrequency (%)
stat16719
 
10.4%
snai25314
 
8.2%
egr15013
 
7.8%
sox94260
 
6.6%
pparg3828
 
5.9%
runx23047
 
4.7%
tbx212632
 
4.1%
tp532590
 
4.0%
prdm12556
 
4.0%
runx32080
 
3.2%
Other values (117)26382
41.0%

Most occurring characters

ValueCountFrequency (%)
129775
 
9.7%
A24168
 
7.9%
S23863
 
7.8%
T22660
 
7.4%
N21281
 
6.9%
R20522
 
6.7%
X17002
 
5.5%
P15833
 
5.1%
O14844
 
4.8%
212779
 
4.2%
Other values (26)104993
34.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter243615
79.2%
Decimal Number62331
 
20.3%
Dash Punctuation1774
 
0.6%

Most frequent character per category

ValueCountFrequency (%)
A24168
9.9%
S23863
 
9.8%
T22660
 
9.3%
N21281
 
8.7%
R20522
 
8.4%
X17002
 
7.0%
P15833
 
6.5%
O14844
 
6.1%
G12519
 
5.1%
E10926
 
4.5%
Other values (15)59997
24.6%
ValueCountFrequency (%)
129775
47.8%
212779
20.5%
36861
 
11.0%
94336
 
7.0%
53845
 
6.2%
42639
 
4.2%
61516
 
2.4%
7315
 
0.5%
8224
 
0.4%
041
 
0.1%
ValueCountFrequency (%)
-1774
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin243615
79.2%
Common64105
 
20.8%

Most frequent character per script

ValueCountFrequency (%)
A24168
9.9%
S23863
 
9.8%
T22660
 
9.3%
N21281
 
8.7%
R20522
 
8.4%
X17002
 
7.0%
P15833
 
6.5%
O14844
 
6.1%
G12519
 
5.1%
E10926
 
4.5%
Other values (15)59997
24.6%
ValueCountFrequency (%)
129775
46.4%
212779
19.9%
36861
 
10.7%
94336
 
6.8%
53845
 
6.0%
42639
 
4.1%
-1774
 
2.8%
61516
 
2.4%
7315
 
0.5%
8224
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII307720
100.0%

Most frequent character per block

ValueCountFrequency (%)
129775
 
9.7%
A24168
 
7.9%
S23863
 
7.8%
T22660
 
7.4%
N21281
 
6.9%
R20522
 
6.7%
X17002
 
5.5%
P15833
 
5.1%
O14844
 
4.8%
212779
 
4.2%
Other values (26)104993
34.1%

Gene 16
Categorical

HIGH CARDINALITY
MISSING

Distinct131
Distinct (%)0.2%
Missing18340
Missing (%)22.5%
Memory size4.3 MiB
TBX21
7012 
STAT1
5314 
CEBPA
 
3976
FOS
 
3142
NEUROG3
 
3085
Other values (126)
40619 

Length

Max length7
Median length5
Mean length4.883606765
Min length2

Characters and Unicode

Total characters308390
Distinct characters36
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8 ?
Unique (%)< 0.1%

Sample

1st rowCEBPD
2nd rowSP110
3rd rowHIC1
4th rowPLAGL1
5th rowBACH2
ValueCountFrequency (%)
TBX217012
 
8.6%
STAT15314
 
6.5%
CEBPA3976
 
4.9%
FOS3142
 
3.9%
NEUROG33085
 
3.8%
SNAI23010
 
3.7%
RUNX32606
 
3.2%
PPARG2606
 
3.2%
XBP12438
 
3.0%
RUNX22430
 
3.0%
Other values (121)27529
33.8%
(Missing)18340
22.5%
Histogram of lengths of the category
ValueCountFrequency (%)
tbx217012
 
11.1%
stat15314
 
8.4%
cebpa3976
 
6.3%
fos3142
 
5.0%
neurog33085
 
4.9%
snai23010
 
4.8%
runx32606
 
4.1%
pparg2606
 
4.1%
xbp12438
 
3.9%
runx22430
 
3.8%
Other values (121)27529
43.6%

Most occurring characters

ValueCountFrequency (%)
131311
 
10.2%
T24239
 
7.9%
X21452
 
7.0%
A20321
 
6.6%
S19207
 
6.2%
N18228
 
5.9%
P17053
 
5.5%
R16407
 
5.3%
O15307
 
5.0%
215300
 
5.0%
Other values (26)109565
35.5%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter243585
79.0%
Decimal Number63031
 
20.4%
Dash Punctuation1774
 
0.6%

Most frequent character per category

ValueCountFrequency (%)
T24239
 
10.0%
X21452
 
8.8%
A20321
 
8.3%
S19207
 
7.9%
N18228
 
7.5%
P17053
 
7.0%
R16407
 
6.7%
O15307
 
6.3%
B14567
 
6.0%
E11637
 
4.8%
Other values (15)65167
26.8%
ValueCountFrequency (%)
131311
49.7%
215300
24.3%
38176
 
13.0%
53212
 
5.1%
91884
 
3.0%
41482
 
2.4%
61054
 
1.7%
7397
 
0.6%
8131
 
0.2%
084
 
0.1%
ValueCountFrequency (%)
-1774
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin243585
79.0%
Common64805
 
21.0%

Most frequent character per script

ValueCountFrequency (%)
T24239
 
10.0%
X21452
 
8.8%
A20321
 
8.3%
S19207
 
7.9%
N18228
 
7.5%
P17053
 
7.0%
R16407
 
6.7%
O15307
 
6.3%
B14567
 
6.0%
E11637
 
4.8%
Other values (15)65167
26.8%
ValueCountFrequency (%)
131311
48.3%
215300
23.6%
38176
 
12.6%
53212
 
5.0%
91884
 
2.9%
-1774
 
2.7%
41482
 
2.3%
61054
 
1.6%
7397
 
0.6%
8131
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII308390
100.0%

Most frequent character per block

ValueCountFrequency (%)
131311
 
10.2%
T24239
 
7.9%
X21452
 
7.0%
A20321
 
6.6%
S19207
 
6.2%
N18228
 
5.9%
P17053
 
5.5%
R16407
 
5.3%
O15307
 
5.0%
215300
 
5.0%
Other values (26)109565
35.5%

Gene 17
Categorical

HIGH CARDINALITY
MISSING

Distinct137
Distinct (%)0.2%
Missing19674
Missing (%)24.1%
Memory size4.2 MiB
RUNX3
6938 
TBX21
5992 
FOXM1
4391 
PPARG
4176 
PRDM1
 
3110
Other values (132)
37207 

Length

Max length7
Median length5
Mean length4.704079982
Min length2

Characters and Unicode

Total characters290778
Distinct characters36
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique13 ?
Unique (%)< 0.1%

Sample

1st rowFEV
2nd rowPAX4
3rd rowSOX14
4th rowMXI1
5th rowIRF2
ValueCountFrequency (%)
RUNX36938
 
8.5%
TBX215992
 
7.4%
FOXM14391
 
5.4%
PPARG4176
 
5.1%
PRDM13110
 
3.8%
STAT13010
 
3.7%
EGR12650
 
3.3%
SP72438
 
3.0%
SNAI22403
 
2.9%
FOS2306
 
2.8%
Other values (127)24400
29.9%
(Missing)19674
24.1%
Histogram of lengths of the category
ValueCountFrequency (%)
runx36938
 
11.2%
tbx215992
 
9.7%
foxm14391
 
7.1%
pparg4176
 
6.8%
prdm13110
 
5.0%
stat13010
 
4.9%
egr12650
 
4.3%
sp72438
 
3.9%
snai22403
 
3.9%
fos2306
 
3.7%
Other values (127)24400
39.5%

Most occurring characters

ValueCountFrequency (%)
127739
 
9.5%
X24848
 
8.5%
R20944
 
7.2%
P19237
 
6.6%
A18109
 
6.2%
T17696
 
6.1%
S17478
 
6.0%
N16495
 
5.7%
O13137
 
4.5%
212196
 
4.2%
Other values (26)102899
35.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter228051
78.4%
Decimal Number61266
 
21.1%
Dash Punctuation1461
 
0.5%

Most frequent character per category

ValueCountFrequency (%)
X24848
10.9%
R20944
 
9.2%
P19237
 
8.4%
A18109
 
7.9%
T17696
 
7.8%
S17478
 
7.7%
N16495
 
7.2%
O13137
 
5.8%
U11095
 
4.9%
F10764
 
4.7%
Other values (15)58248
25.5%
ValueCountFrequency (%)
127739
45.3%
212196
19.9%
311175
18.2%
42545
 
4.2%
72522
 
4.1%
52377
 
3.9%
91534
 
2.5%
61082
 
1.8%
079
 
0.1%
817
 
< 0.1%
ValueCountFrequency (%)
-1461
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin228051
78.4%
Common62727
 
21.6%

Most frequent character per script

ValueCountFrequency (%)
X24848
10.9%
R20944
 
9.2%
P19237
 
8.4%
A18109
 
7.9%
T17696
 
7.8%
S17478
 
7.7%
N16495
 
7.2%
O13137
 
5.8%
U11095
 
4.9%
F10764
 
4.7%
Other values (15)58248
25.5%
ValueCountFrequency (%)
127739
44.2%
212196
19.4%
311175
17.8%
42545
 
4.1%
72522
 
4.0%
52377
 
3.8%
91534
 
2.4%
-1461
 
2.3%
61082
 
1.7%
079
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII290778
100.0%

Most frequent character per block

ValueCountFrequency (%)
127739
 
9.5%
X24848
 
8.5%
R20944
 
7.2%
P19237
 
6.6%
A18109
 
6.2%
T17696
 
6.1%
S17478
 
6.0%
N16495
 
5.7%
O13137
 
4.5%
212196
 
4.2%
Other values (26)102899
35.4%

Gene 18
Categorical

HIGH CARDINALITY
MISSING

Distinct145
Distinct (%)0.2%
Missing21029
Missing (%)25.8%
Memory size4.2 MiB
EGR1
6486 
RUNX3
5926 
TP53
3957 
TBX21
 
3620
SOX9
 
2945
Other values (140)
37525 

Length

Max length7
Median length5
Mean length4.563340446
Min length2

Characters and Unicode

Total characters275895
Distinct characters36
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique17 ?
Unique (%)< 0.1%

Sample

1st rowTCF7
2nd rowARID5B
3rd rowARID5B
4th rowZEB2
5th rowTBR1
ValueCountFrequency (%)
EGR16486
 
8.0%
RUNX35926
 
7.3%
TP533957
 
4.9%
TBX213620
 
4.4%
SOX92945
 
3.6%
CEBPA2553
 
3.1%
SATB22412
 
3.0%
STAT12402
 
2.9%
PRDM12282
 
2.8%
MSX12279
 
2.8%
Other values (135)25597
31.4%
(Missing)21029
25.8%
Histogram of lengths of the category
ValueCountFrequency (%)
egr16486
 
10.7%
runx35926
 
9.8%
tp533957
 
6.5%
tbx213620
 
6.0%
sox92945
 
4.9%
cebpa2553
 
4.2%
satb22412
 
4.0%
stat12402
 
4.0%
prdm12282
 
3.8%
msx12279
 
3.8%
Other values (135)25597
42.3%

Most occurring characters

ValueCountFrequency (%)
124280
 
8.8%
X22179
 
8.0%
R20278
 
7.3%
T20263
 
7.3%
S17841
 
6.5%
A16905
 
6.1%
P15772
 
5.7%
313926
 
5.0%
211911
 
4.3%
E11803
 
4.3%
Other values (26)100737
36.5%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter213052
77.2%
Decimal Number61599
 
22.3%
Dash Punctuation1244
 
0.5%

Most frequent character per category

ValueCountFrequency (%)
X22179
 
10.4%
R20278
 
9.5%
T20263
 
9.5%
S17841
 
8.4%
A16905
 
7.9%
P15772
 
7.4%
E11803
 
5.5%
N11748
 
5.5%
B10361
 
4.9%
G9841
 
4.6%
Other values (15)56061
26.3%
ValueCountFrequency (%)
124280
39.4%
313926
22.6%
211911
19.3%
54547
 
7.4%
93009
 
4.9%
41692
 
2.7%
71238
 
2.0%
6939
 
1.5%
040
 
0.1%
817
 
< 0.1%
ValueCountFrequency (%)
-1244
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin213052
77.2%
Common62843
 
22.8%

Most frequent character per script

ValueCountFrequency (%)
X22179
 
10.4%
R20278
 
9.5%
T20263
 
9.5%
S17841
 
8.4%
A16905
 
7.9%
P15772
 
7.4%
E11803
 
5.5%
N11748
 
5.5%
B10361
 
4.9%
G9841
 
4.6%
Other values (15)56061
26.3%
ValueCountFrequency (%)
124280
38.6%
313926
22.2%
211911
19.0%
54547
 
7.2%
93009
 
4.8%
41692
 
2.7%
-1244
 
2.0%
71238
 
2.0%
6939
 
1.5%
040
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII275895
100.0%

Most frequent character per block

ValueCountFrequency (%)
124280
 
8.8%
X22179
 
8.0%
R20278
 
7.3%
T20263
 
7.3%
S17841
 
6.5%
A16905
 
6.1%
P15772
 
5.7%
313926
 
5.0%
211911
 
4.3%
E11803
 
4.3%
Other values (26)100737
36.5%

Gene 19
Categorical

HIGH CARDINALITY
MISSING

Distinct146
Distinct (%)0.2%
Missing22335
Missing (%)27.4%
Memory size4.2 MiB
CEBPA
6485 
RUNX2
5159 
EGR1
4789 
RUNX3
 
3579
KLF2
 
3341
Other values (141)
35800 

Length

Max length7
Median length5
Mean length4.613629064
Min length2

Characters and Unicode

Total characters272910
Distinct characters36
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique20 ?
Unique (%)< 0.1%

Sample

1st rowLIN28B
2nd rowIRF2
3rd rowHES6
4th rowARID5B
5th rowARID3A
ValueCountFrequency (%)
CEBPA6485
 
8.0%
RUNX25159
 
6.3%
EGR14789
 
5.9%
RUNX33579
 
4.4%
KLF23341
 
4.1%
TBX212735
 
3.4%
PPARG2648
 
3.2%
PAX32272
 
2.8%
TP532180
 
2.7%
MSX11696
 
2.1%
Other values (136)24269
29.8%
(Missing)22335
27.4%
Histogram of lengths of the category
ValueCountFrequency (%)
cebpa6485
 
11.0%
runx25159
 
8.7%
egr14789
 
8.1%
runx33579
 
6.1%
klf23341
 
5.6%
tbx212735
 
4.6%
pparg2648
 
4.5%
pax32272
 
3.8%
tp532180
 
3.7%
msx11696
 
2.9%
Other values (136)24269
41.0%

Most occurring characters

ValueCountFrequency (%)
X22075
 
8.1%
A21439
 
7.9%
R21258
 
7.8%
P20524
 
7.5%
118821
 
6.9%
215150
 
5.6%
T13259
 
4.9%
E13216
 
4.8%
N13097
 
4.8%
312103
 
4.4%
Other values (26)101968
37.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter218601
80.1%
Decimal Number53231
 
19.5%
Dash Punctuation1078
 
0.4%

Most frequent character per category

ValueCountFrequency (%)
X22075
 
10.1%
A21439
 
9.8%
R21258
 
9.7%
P20524
 
9.4%
T13259
 
6.1%
E13216
 
6.0%
N13097
 
6.0%
B11436
 
5.2%
S10540
 
4.8%
F10237
 
4.7%
Other values (15)61520
28.1%
ValueCountFrequency (%)
118821
35.4%
215150
28.5%
312103
22.7%
52508
 
4.7%
41636
 
3.1%
91088
 
2.0%
71007
 
1.9%
8608
 
1.1%
6272
 
0.5%
038
 
0.1%
ValueCountFrequency (%)
-1078
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin218601
80.1%
Common54309
 
19.9%

Most frequent character per script

ValueCountFrequency (%)
X22075
 
10.1%
A21439
 
9.8%
R21258
 
9.7%
P20524
 
9.4%
T13259
 
6.1%
E13216
 
6.0%
N13097
 
6.0%
B11436
 
5.2%
S10540
 
4.8%
F10237
 
4.7%
Other values (15)61520
28.1%
ValueCountFrequency (%)
118821
34.7%
215150
27.9%
312103
22.3%
52508
 
4.6%
41636
 
3.0%
91088
 
2.0%
-1078
 
2.0%
71007
 
1.9%
8608
 
1.1%
6272
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII272910
100.0%

Most frequent character per block

ValueCountFrequency (%)
X22075
 
8.1%
A21439
 
7.9%
R21258
 
7.8%
P20524
 
7.5%
118821
 
6.9%
215150
 
5.6%
T13259
 
4.9%
E13216
 
4.8%
N13097
 
4.8%
312103
 
4.4%
Other values (26)101968
37.4%

Gene 20
Categorical

HIGH CARDINALITY
MISSING

Distinct147
Distinct (%)0.3%
Missing23841
Missing (%)29.3%
Memory size4.1 MiB
PPARG
6307 
SNAI2
5102 
CEBPA
4787 
TP53
 
3380
GATA4
 
3209
Other values (142)
34862 

Length

Max length7
Median length5
Mean length4.65345985
Min length2

Characters and Unicode

Total characters268258
Distinct characters36
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique20 ?
Unique (%)< 0.1%

Sample

1st rowHES6
2nd rowIRF2
3rd rowTET1
4th rowHMGA1
5th rowDPF3
ValueCountFrequency (%)
PPARG6307
 
7.7%
SNAI25102
 
6.3%
CEBPA4787
 
5.9%
TP533380
 
4.1%
GATA43209
 
3.9%
RUNX32705
 
3.3%
EGR12328
 
2.9%
MYOD12320
 
2.8%
RUNX22015
 
2.5%
KLF21913
 
2.3%
Other values (137)23581
28.9%
(Missing)23841
29.3%
Histogram of lengths of the category
ValueCountFrequency (%)
pparg6307
 
10.9%
snai25102
 
8.9%
cebpa4787
 
8.3%
tp533380
 
5.9%
gata43209
 
5.6%
runx32705
 
4.7%
egr12328
 
4.0%
myod12320
 
4.0%
runx22015
 
3.5%
klf21913
 
3.3%
Other values (137)23581
40.9%

Most occurring characters

ValueCountFrequency (%)
A30211
 
11.3%
P26489
 
9.9%
R18169
 
6.8%
X16030
 
6.0%
115428
 
5.8%
213058
 
4.9%
G12899
 
4.8%
T12623
 
4.7%
N11865
 
4.4%
F11017
 
4.1%
Other values (26)100469
37.5%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter217099
80.9%
Decimal Number50145
 
18.7%
Dash Punctuation1014
 
0.4%

Most frequent character per category

ValueCountFrequency (%)
A30211
13.9%
P26489
12.2%
R18169
 
8.4%
X16030
 
7.4%
G12899
 
5.9%
T12623
 
5.8%
N11865
 
5.5%
F11017
 
5.1%
O10777
 
5.0%
S10064
 
4.6%
Other values (15)56955
26.2%
ValueCountFrequency (%)
115428
30.8%
213058
26.0%
310845
21.6%
53904
 
7.8%
43399
 
6.8%
81550
 
3.1%
91059
 
2.1%
7717
 
1.4%
6155
 
0.3%
030
 
0.1%
ValueCountFrequency (%)
-1014
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin217099
80.9%
Common51159
 
19.1%

Most frequent character per script

ValueCountFrequency (%)
A30211
13.9%
P26489
12.2%
R18169
 
8.4%
X16030
 
7.4%
G12899
 
5.9%
T12623
 
5.8%
N11865
 
5.5%
F11017
 
5.1%
O10777
 
5.0%
S10064
 
4.6%
Other values (15)56955
26.2%
ValueCountFrequency (%)
115428
30.2%
213058
25.5%
310845
21.2%
53904
 
7.6%
43399
 
6.6%
81550
 
3.0%
91059
 
2.1%
-1014
 
2.0%
7717
 
1.4%
6155
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII268258
100.0%

Most frequent character per block

ValueCountFrequency (%)
A30211
 
11.3%
P26489
 
9.9%
R18169
 
6.8%
X16030
 
6.0%
115428
 
5.8%
213058
 
4.9%
G12899
 
4.8%
T12623
 
4.7%
N11865
 
4.4%
F11017
 
4.1%
Other values (26)100469
37.5%

Gene 21
Categorical

HIGH CARDINALITY
MISSING

Distinct145
Distinct (%)0.3%
Missing25145
Missing (%)30.9%
Memory size4.1 MiB
STAT1
5102 
TP53
4878 
PPARG
4230 
KLF2
 
3421
MEF2C
 
3166
Other values (140)
35546 

Length

Max length7
Median length5
Mean length4.578758675
Min length2

Characters and Unicode

Total characters257981
Distinct characters36
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique17 ?
Unique (%)< 0.1%

Sample

1st rowNKX6-1
2nd rowHEY1
3rd rowE2F7
4th rowLIN28B
5th rowSPDEF
ValueCountFrequency (%)
STAT15102
 
6.3%
TP534878
 
6.0%
PPARG4230
 
5.2%
KLF23421
 
4.2%
MEF2C3166
 
3.9%
CEBPA2322
 
2.8%
MYOG2239
 
2.7%
EGR12238
 
2.7%
SNAI21986
 
2.4%
FOS1861
 
2.3%
Other values (135)24900
30.6%
(Missing)25145
30.9%
Histogram of lengths of the category
ValueCountFrequency (%)
stat15102
 
9.1%
tp534878
 
8.7%
pparg4230
 
7.5%
klf23421
 
6.1%
mef2c3166
 
5.6%
cebpa2322
 
4.1%
myog2239
 
4.0%
egr12238
 
4.0%
snai21986
 
3.5%
fos1861
 
3.3%
Other values (135)24900
44.2%

Most occurring characters

ValueCountFrequency (%)
A22855
 
8.9%
T20190
 
7.8%
P19247
 
7.5%
117683
 
6.9%
F16944
 
6.6%
214234
 
5.5%
R13203
 
5.1%
S12294
 
4.8%
O11995
 
4.6%
G11744
 
4.6%
Other values (26)97592
37.8%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter202736
78.6%
Decimal Number53830
 
20.9%
Dash Punctuation1415
 
0.5%

Most frequent character per category

ValueCountFrequency (%)
A22855
11.3%
T20190
 
10.0%
P19247
 
9.5%
F16944
 
8.4%
R13203
 
6.5%
S12294
 
6.1%
O11995
 
5.9%
G11744
 
5.8%
X11478
 
5.7%
E9187
 
4.5%
Other values (15)53599
26.4%
ValueCountFrequency (%)
117683
32.8%
214234
26.4%
310028
18.6%
56103
 
11.3%
41957
 
3.6%
81854
 
3.4%
7997
 
1.9%
9838
 
1.6%
6119
 
0.2%
017
 
< 0.1%
ValueCountFrequency (%)
-1415
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin202736
78.6%
Common55245
 
21.4%

Most frequent character per script

ValueCountFrequency (%)
A22855
11.3%
T20190
 
10.0%
P19247
 
9.5%
F16944
 
8.4%
R13203
 
6.5%
S12294
 
6.1%
O11995
 
5.9%
G11744
 
5.8%
X11478
 
5.7%
E9187
 
4.5%
Other values (15)53599
26.4%
ValueCountFrequency (%)
117683
32.0%
214234
25.8%
310028
18.2%
56103
 
11.0%
41957
 
3.5%
81854
 
3.4%
-1415
 
2.6%
7997
 
1.8%
9838
 
1.5%
6119
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII257981
100.0%

Most frequent character per block

ValueCountFrequency (%)
A22855
 
8.9%
T20190
 
7.8%
P19247
 
7.5%
117683
 
6.9%
F16944
 
6.6%
214234
 
5.5%
R13203
 
5.1%
S12294
 
4.8%
O11995
 
4.6%
G11744
 
4.6%
Other values (26)97592
37.8%

Gene 22
Categorical

HIGH CARDINALITY
MISSING

Distinct135
Distinct (%)0.2%
Missing26233
Missing (%)32.2%
Memory size4.0 MiB
IRF8
5782 
KLF2
4580 
GATA4
 
3242
NR3C1
 
2967
TP53
 
2441
Other values (130)
36243 

Length

Max length7
Median length5
Mean length4.538756674
Min length2

Characters and Unicode

Total characters250789
Distinct characters36
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique14 ?
Unique (%)< 0.1%

Sample

1st rowVDR
2nd rowSP110
3rd rowRAG1
4th rowTET1
5th rowE2F7
ValueCountFrequency (%)
IRF85782
 
7.1%
KLF24580
 
5.6%
GATA43242
 
4.0%
NR3C12967
 
3.6%
TP532441
 
3.0%
FOXO12264
 
2.8%
MEF2A2239
 
2.7%
CEBPA2238
 
2.7%
STAT11985
 
2.4%
PRDM11836
 
2.3%
Other values (125)25681
31.5%
(Missing)26233
32.2%
Histogram of lengths of the category
ValueCountFrequency (%)
irf85782
 
10.5%
klf24580
 
8.3%
gata43242
 
5.9%
nr3c12967
 
5.4%
tp532441
 
4.4%
foxo12264
 
4.1%
mef2a2239
 
4.1%
cebpa2238
 
4.1%
stat11985
 
3.6%
prdm11836
 
3.3%
Other values (125)25681
46.5%

Most occurring characters

ValueCountFrequency (%)
F22290
 
8.9%
A21967
 
8.8%
R16503
 
6.6%
215821
 
6.3%
115241
 
6.1%
T13362
 
5.3%
P13107
 
5.2%
O10515
 
4.2%
S10030
 
4.0%
G9673
 
3.9%
Other values (26)102280
40.8%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter194783
77.7%
Decimal Number54644
 
21.8%
Dash Punctuation1362
 
0.5%

Most frequent character per category

ValueCountFrequency (%)
F22290
 
11.4%
A21967
 
11.3%
R16503
 
8.5%
T13362
 
6.9%
P13107
 
6.7%
O10515
 
5.4%
S10030
 
5.1%
G9673
 
5.0%
X9169
 
4.7%
N8995
 
4.6%
Other values (15)59172
30.4%
ValueCountFrequency (%)
215821
29.0%
115241
27.9%
39029
16.5%
85846
 
10.7%
53835
 
7.0%
43506
 
6.4%
7823
 
1.5%
9295
 
0.5%
6189
 
0.3%
059
 
0.1%
ValueCountFrequency (%)
-1362
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin194783
77.7%
Common56006
 
22.3%

Most frequent character per script

ValueCountFrequency (%)
F22290
 
11.4%
A21967
 
11.3%
R16503
 
8.5%
T13362
 
6.9%
P13107
 
6.7%
O10515
 
5.4%
S10030
 
5.1%
G9673
 
5.0%
X9169
 
4.7%
N8995
 
4.6%
Other values (15)59172
30.4%
ValueCountFrequency (%)
215821
28.2%
115241
27.2%
39029
16.1%
85846
 
10.4%
53835
 
6.8%
43506
 
6.3%
-1362
 
2.4%
7823
 
1.5%
9295
 
0.5%
6189
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII250789
100.0%

Most frequent character per block

ValueCountFrequency (%)
F22290
 
8.9%
A21967
 
8.8%
R16503
 
6.6%
215821
 
6.3%
115241
 
6.1%
T13362
 
5.3%
P13107
 
5.2%
O10515
 
4.2%
S10030
 
4.0%
G9673
 
3.9%
Other values (26)102280
40.8%

Gene 23
Categorical

HIGH CARDINALITY
MISSING

Distinct148
Distinct (%)0.3%
Missing27594
Missing (%)33.9%
Memory size4.0 MiB
GATA4
4295 
FOXO1
3965 
GATA2
 
3252
MEF2C
 
3199
IRF8
 
2620
Other values (143)
36563 

Length

Max length7
Median length5
Mean length4.608101087
Min length2

Characters and Unicode

Total characters248349
Distinct characters36
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10 ?
Unique (%)< 0.1%

Sample

1st rowTET1
2nd rowPOU2AF1
3rd rowTAL2
4th rowZBTB49
5th rowZBTB49
ValueCountFrequency (%)
GATA44295
 
5.3%
FOXO13965
 
4.9%
GATA23252
 
4.0%
MEF2C3199
 
3.9%
IRF82620
 
3.2%
EBF12458
 
3.0%
PPARG2400
 
2.9%
KLF22384
 
2.9%
JUN2239
 
2.7%
MSX11790
 
2.2%
Other values (138)25292
31.0%
(Missing)27594
33.9%
Histogram of lengths of the category
ValueCountFrequency (%)
gata44295
 
8.0%
foxo13965
 
7.4%
gata23252
 
6.0%
mef2c3199
 
5.9%
irf82620
 
4.9%
ebf12458
 
4.6%
pparg2400
 
4.5%
klf22384
 
4.4%
jun2239
 
4.2%
msx11790
 
3.3%
Other values (138)25292
46.9%

Most occurring characters

ValueCountFrequency (%)
A26712
 
10.8%
F21612
 
8.7%
118305
 
7.4%
T15867
 
6.4%
215198
 
6.1%
O12634
 
5.1%
P12496
 
5.0%
R12332
 
5.0%
G11728
 
4.7%
X11175
 
4.5%
Other values (26)90290
36.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter197846
79.7%
Decimal Number50253
 
20.2%
Dash Punctuation250
 
0.1%

Most frequent character per category

ValueCountFrequency (%)
A26712
13.5%
F21612
10.9%
T15867
 
8.0%
O12634
 
6.4%
P12496
 
6.3%
R12332
 
6.2%
G11728
 
5.9%
X11175
 
5.6%
E10677
 
5.4%
S10522
 
5.3%
Other values (15)52091
26.3%
ValueCountFrequency (%)
118305
36.4%
215198
30.2%
36591
 
13.1%
45383
 
10.7%
82676
 
5.3%
51674
 
3.3%
6171
 
0.3%
7123
 
0.2%
9104
 
0.2%
028
 
0.1%
ValueCountFrequency (%)
-250
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin197846
79.7%
Common50503
 
20.3%

Most frequent character per script

ValueCountFrequency (%)
A26712
13.5%
F21612
10.9%
T15867
 
8.0%
O12634
 
6.4%
P12496
 
6.3%
R12332
 
6.2%
G11728
 
5.9%
X11175
 
5.6%
E10677
 
5.4%
S10522
 
5.3%
Other values (15)52091
26.3%
ValueCountFrequency (%)
118305
36.2%
215198
30.1%
36591
 
13.1%
45383
 
10.7%
82676
 
5.3%
51674
 
3.3%
-250
 
0.5%
6171
 
0.3%
7123
 
0.2%
9104
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII248349
100.0%

Most frequent character per block

ValueCountFrequency (%)
A26712
 
10.8%
F21612
 
8.7%
118305
 
7.4%
T15867
 
6.4%
215198
 
6.1%
O12634
 
5.1%
P12496
 
5.0%
R12332
 
5.0%
G11728
 
4.7%
X11175
 
4.5%
Other values (26)90290
36.4%

Gene 24
Categorical

HIGH CARDINALITY
MISSING

Distinct170
Distinct (%)0.3%
Missing28899
Missing (%)35.5%
Memory size4.0 MiB
MEF2C
4238 
FOXP3
 
3514
PPARG
 
3351
SPI1
 
3252
NR3C1
 
3074
Other values (165)
35160 

Length

Max length7
Median length5
Mean length4.603167963
Min length2

Characters and Unicode

Total characters242076
Distinct characters36
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique20 ?
Unique (%)< 0.1%

Sample

1st rowTBX5
2nd rowCEBPD
3rd rowPOU2AF1
4th rowTWIST2
5th rowFOXK2
ValueCountFrequency (%)
MEF2C4238
 
5.2%
FOXP33514
 
4.3%
PPARG3351
 
4.1%
SPI13252
 
4.0%
NR3C13074
 
3.8%
IRF83065
 
3.8%
GATA42290
 
2.8%
SREBF12119
 
2.6%
PAX31790
 
2.2%
FOXO11707
 
2.1%
Other values (160)24189
29.7%
(Missing)28899
35.5%
Histogram of lengths of the category
ValueCountFrequency (%)
mef2c4238
 
8.1%
foxp33514
 
6.7%
pparg3351
 
6.4%
spi13252
 
6.2%
nr3c13074
 
5.8%
irf83065
 
5.8%
gata42290
 
4.4%
srebf12119
 
4.0%
pax31790
 
3.4%
foxo11707
 
3.2%
Other values (160)24189
46.0%

Most occurring characters

ValueCountFrequency (%)
F20995
 
8.7%
P19450
 
8.0%
119185
 
7.9%
A18334
 
7.6%
R15508
 
6.4%
S11932
 
4.9%
311768
 
4.9%
E11028
 
4.6%
T10998
 
4.5%
O10732
 
4.4%
Other values (26)92146
38.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter192547
79.5%
Decimal Number49333
 
20.4%
Dash Punctuation196
 
0.1%

Most frequent character per category

ValueCountFrequency (%)
F20995
 
10.9%
P19450
 
10.1%
A18334
 
9.5%
R15508
 
8.1%
S11932
 
6.2%
E11028
 
5.7%
T10998
 
5.7%
O10732
 
5.6%
X10679
 
5.5%
C9427
 
4.9%
Other values (15)53464
27.8%
ValueCountFrequency (%)
119185
38.9%
311768
23.9%
29350
19.0%
43676
 
7.5%
83110
 
6.3%
51962
 
4.0%
7129
 
0.3%
667
 
0.1%
944
 
0.1%
042
 
0.1%
ValueCountFrequency (%)
-196
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin192547
79.5%
Common49529
 
20.5%

Most frequent character per script

ValueCountFrequency (%)
F20995
 
10.9%
P19450
 
10.1%
A18334
 
9.5%
R15508
 
8.1%
S11932
 
6.2%
E11028
 
5.7%
T10998
 
5.7%
O10732
 
5.6%
X10679
 
5.5%
C9427
 
4.9%
Other values (15)53464
27.8%
ValueCountFrequency (%)
119185
38.7%
311768
23.8%
29350
18.9%
43676
 
7.4%
83110
 
6.3%
51962
 
4.0%
-196
 
0.4%
7129
 
0.3%
667
 
0.1%
944
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII242076
100.0%

Most frequent character per block

ValueCountFrequency (%)
F20995
 
8.7%
P19450
 
8.0%
119185
 
7.9%
A18334
 
7.6%
R15508
 
6.4%
S11932
 
4.9%
311768
 
4.9%
E11028
 
4.6%
T10998
 
4.5%
O10732
 
4.4%
Other values (26)92146
38.1%

Gene 25
Categorical

HIGH CARDINALITY
MISSING

Distinct174
Distinct (%)0.3%
Missing30382
Missing (%)37.3%
Memory size3.9 MiB
IRF8
4587 
NR3C1
4041 
MYOD1
 
2571
FOXO1
 
2517
PPARG
 
2425
Other values (169)
34965 

Length

Max length7
Median length5
Mean length4.509940124
Min length2

Characters and Unicode

Total characters230485
Distinct characters36
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique14 ?
Unique (%)< 0.1%

Sample

1st rowZNF385A
2nd rowPLSCR1
3rd rowPLAGL1
4th rowARID3A
5th rowBATF
ValueCountFrequency (%)
IRF84587
 
5.6%
NR3C14041
 
5.0%
MYOD12571
 
3.2%
FOXO12517
 
3.1%
PPARG2425
 
3.0%
EBF12301
 
2.8%
MEF2C2260
 
2.8%
KLF42188
 
2.7%
AR2119
 
2.6%
FOS2028
 
2.5%
Other values (164)24069
29.5%
(Missing)30382
37.3%
Histogram of lengths of the category
ValueCountFrequency (%)
irf84587
 
9.0%
nr3c14041
 
7.9%
myod12571
 
5.0%
foxo12517
 
4.9%
pparg2425
 
4.7%
ebf12301
 
4.5%
mef2c2260
 
4.4%
klf42188
 
4.3%
ar2119
 
4.1%
fos2028
 
4.0%
Other values (164)24069
47.1%

Most occurring characters

ValueCountFrequency (%)
F22348
 
9.7%
120960
 
9.1%
R18125
 
7.9%
A15973
 
6.9%
O14186
 
6.2%
P12904
 
5.6%
39938
 
4.3%
E9835
 
4.3%
T9438
 
4.1%
S9083
 
3.9%
Other values (26)87695
38.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter183093
79.4%
Decimal Number47329
 
20.5%
Dash Punctuation63
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
F22348
12.2%
R18125
 
9.9%
A15973
 
8.7%
O14186
 
7.7%
P12904
 
7.0%
E9835
 
5.4%
T9438
 
5.2%
S9083
 
5.0%
M8346
 
4.6%
C8138
 
4.4%
Other values (15)54717
29.9%
ValueCountFrequency (%)
120960
44.3%
39938
21.0%
26442
 
13.6%
84632
 
9.8%
43300
 
7.0%
51640
 
3.5%
9156
 
0.3%
0120
 
0.3%
790
 
0.2%
651
 
0.1%
ValueCountFrequency (%)
-63
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin183093
79.4%
Common47392
 
20.6%

Most frequent character per script

ValueCountFrequency (%)
F22348
12.2%
R18125
 
9.9%
A15973
 
8.7%
O14186
 
7.7%
P12904
 
7.0%
E9835
 
5.4%
T9438
 
5.2%
S9083
 
5.0%
M8346
 
4.6%
C8138
 
4.4%
Other values (15)54717
29.9%
ValueCountFrequency (%)
120960
44.2%
39938
21.0%
26442
 
13.6%
84632
 
9.8%
43300
 
7.0%
51640
 
3.5%
9156
 
0.3%
0120
 
0.3%
790
 
0.2%
-63
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII230485
100.0%

Most frequent character per block

ValueCountFrequency (%)
F22348
 
9.7%
120960
 
9.1%
R18125
 
7.9%
A15973
 
6.9%
O14186
 
6.2%
P12904
 
5.6%
39938
 
4.3%
E9835
 
4.3%
T9438
 
4.1%
S9083
 
3.9%
Other values (26)87695
38.0%

Gene 26
Categorical

HIGH CARDINALITY
MISSING

Distinct178
Distinct (%)0.4%
Missing31833
Missing (%)39.1%
Memory size3.9 MiB
TWIST1
4006 
FOXO1
 
3242
IRF8
 
2596
FOXP3
 
2567
EBF1
 
2494
Other values (173)
34750 

Length

Max length7
Median length5
Mean length4.616614641
Min length2

Characters and Unicode

Total characters229238
Distinct characters36
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique21 ?
Unique (%)< 0.1%

Sample

1st rowHIC1
2nd rowTAL2
3rd rowMXI1
4th rowNR2F2
5th rowLIN28B
ValueCountFrequency (%)
TWIST14006
 
4.9%
FOXO13242
 
4.0%
IRF82596
 
3.2%
FOXP32567
 
3.2%
EBF12494
 
3.1%
PPARG2492
 
3.1%
MYOG2486
 
3.1%
NR3C12156
 
2.6%
GATA22095
 
2.6%
PRDM12001
 
2.5%
Other values (168)23520
28.9%
(Missing)31833
39.1%
Histogram of lengths of the category
ValueCountFrequency (%)
twist14006
 
8.1%
foxo13242
 
6.5%
irf82596
 
5.2%
foxp32567
 
5.2%
ebf12494
 
5.0%
pparg2492
 
5.0%
myog2486
 
5.0%
nr3c12156
 
4.3%
gata22095
 
4.2%
prdm12001
 
4.0%
Other values (168)23520
47.4%

Most occurring characters

ValueCountFrequency (%)
120950
 
9.1%
F20264
 
8.8%
O19297
 
8.4%
A16547
 
7.2%
T15975
 
7.0%
R13894
 
6.1%
P12896
 
5.6%
M10479
 
4.6%
X10403
 
4.5%
S9682
 
4.2%
Other values (26)78851
34.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter186459
81.3%
Decimal Number42742
 
18.6%
Dash Punctuation37
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
F20264
10.9%
O19297
10.3%
A16547
 
8.9%
T15975
 
8.6%
R13894
 
7.5%
P12896
 
6.9%
M10479
 
5.6%
X10403
 
5.6%
S9682
 
5.2%
G9347
 
5.0%
Other values (15)47675
25.6%
ValueCountFrequency (%)
120950
49.0%
38596
20.1%
25778
 
13.5%
42695
 
6.3%
82632
 
6.2%
51720
 
4.0%
9227
 
0.5%
793
 
0.2%
626
 
0.1%
025
 
0.1%
ValueCountFrequency (%)
-37
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin186459
81.3%
Common42779
 
18.7%

Most frequent character per script

ValueCountFrequency (%)
F20264
10.9%
O19297
10.3%
A16547
 
8.9%
T15975
 
8.6%
R13894
 
7.5%
P12896
 
6.9%
M10479
 
5.6%
X10403
 
5.6%
S9682
 
5.2%
G9347
 
5.0%
Other values (15)47675
25.6%
ValueCountFrequency (%)
120950
49.0%
38596
20.1%
25778
 
13.5%
42695
 
6.3%
82632
 
6.2%
51720
 
4.0%
9227
 
0.5%
793
 
0.2%
-37
 
0.1%
626
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII229238
100.0%

Most frequent character per block

ValueCountFrequency (%)
120950
 
9.1%
F20264
 
8.8%
O19297
 
8.4%
A16547
 
7.2%
T15975
 
7.0%
R13894
 
6.1%
P12896
 
5.6%
M10479
 
4.6%
X10403
 
4.5%
S9682
 
4.2%
Other values (26)78851
34.4%

Gene 27
Categorical

HIGH CARDINALITY
MISSING

Distinct184
Distinct (%)0.4%
Missing33309
Missing (%)40.9%
Memory size3.8 MiB
FOXM1
4361 
FOXP3
3417 
MEF2A
3400 
IRF8
3217 
FOS
 
2597
Other values (179)
31187 

Length

Max length7
Median length5
Mean length4.556902385
Min length2

Characters and Unicode

Total characters219547
Distinct characters36
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique22 ?
Unique (%)< 0.1%

Sample

1st rowREL
2nd rowSKIL
3rd rowARID3A
4th rowPOU2AF1
5th rowNR2F2
ValueCountFrequency (%)
FOXM14361
 
5.4%
FOXP33417
 
4.2%
MEF2A3400
 
4.2%
IRF83217
 
3.9%
FOS2597
 
3.2%
PPARG2450
 
3.0%
TWIST12432
 
3.0%
SPI12095
 
2.6%
MSX11937
 
2.4%
PRDM11795
 
2.2%
Other values (174)20478
25.1%
(Missing)33309
40.9%
Histogram of lengths of the category
ValueCountFrequency (%)
foxm14361
 
9.1%
foxp33417
 
7.1%
mef2a3400
 
7.1%
irf83217
 
6.7%
fos2597
 
5.4%
pparg2450
 
5.1%
twist12432
 
5.0%
spi12095
 
4.3%
msx11937
 
4.0%
prdm11795
 
3.7%
Other values (174)20478
42.5%

Most occurring characters

ValueCountFrequency (%)
F23940
 
10.9%
118797
 
8.6%
O17543
 
8.0%
M15871
 
7.2%
P14090
 
6.4%
A14057
 
6.4%
X13541
 
6.2%
R11159
 
5.1%
S11115
 
5.1%
T9559
 
4.4%
Other values (26)69875
31.8%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter179562
81.8%
Decimal Number39940
 
18.2%
Dash Punctuation45
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
F23940
13.3%
O17543
9.8%
M15871
 
8.8%
P14090
 
7.8%
A14057
 
7.8%
X13541
 
7.5%
R11159
 
6.2%
S11115
 
6.2%
T9559
 
5.3%
E8710
 
4.9%
Other values (15)39977
22.3%
ValueCountFrequency (%)
118797
47.1%
28264
20.7%
36509
 
16.3%
83233
 
8.1%
41801
 
4.5%
9646
 
1.6%
5507
 
1.3%
793
 
0.2%
058
 
0.1%
632
 
0.1%
ValueCountFrequency (%)
-45
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin179562
81.8%
Common39985
 
18.2%

Most frequent character per script

ValueCountFrequency (%)
F23940
13.3%
O17543
9.8%
M15871
 
8.8%
P14090
 
7.8%
A14057
 
7.8%
X13541
 
7.5%
R11159
 
6.2%
S11115
 
6.2%
T9559
 
5.3%
E8710
 
4.9%
Other values (15)39977
22.3%
ValueCountFrequency (%)
118797
47.0%
28264
20.7%
36509
 
16.3%
83233
 
8.1%
41801
 
4.5%
9646
 
1.6%
5507
 
1.3%
793
 
0.2%
058
 
0.1%
-45
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII219547
100.0%

Most frequent character per block

ValueCountFrequency (%)
F23940
 
10.9%
118797
 
8.6%
O17543
 
8.0%
M15871
 
7.2%
P14090
 
6.4%
A14057
 
6.4%
X13541
 
6.2%
R11159
 
5.1%
S11115
 
5.1%
T9559
 
4.4%
Other values (26)69875
31.8%

Gene 28
Categorical

HIGH CARDINALITY
MISSING

Distinct188
Distinct (%)0.4%
Missing34822
Missing (%)42.7%
Memory size3.8 MiB
IRF8
3360 
JUN
3313 
FOS
 
2867
STAT3
 
2653
PRDM1
 
2559
Other values (183)
31914 

Length

Max length7
Median length5
Mean length4.395684224
Min length2

Characters and Unicode

Total characters205129
Distinct characters36
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique16 ?
Unique (%)< 0.1%

Sample

1st rowREST
2nd rowNR2F2
3rd rowMYF5
4th rowMXI1
5th rowFOXF1
ValueCountFrequency (%)
IRF83360
 
4.1%
JUN3313
 
4.1%
FOS2867
 
3.5%
STAT32653
 
3.3%
PRDM12559
 
3.1%
FOXM12448
 
3.0%
GATA22214
 
2.7%
KLF42013
 
2.5%
PAX31937
 
2.4%
FOXP31899
 
2.3%
Other values (178)21403
26.3%
(Missing)34822
42.7%
Histogram of lengths of the category
ValueCountFrequency (%)
irf83360
 
7.2%
jun3313
 
7.1%
fos2867
 
6.1%
stat32653
 
5.7%
prdm12559
 
5.5%
foxm12448
 
5.2%
gata22214
 
4.7%
klf42013
 
4.3%
pax31937
 
4.2%
foxp31899
 
4.1%
Other values (178)21403
45.9%

Most occurring characters

ValueCountFrequency (%)
F18583
 
9.1%
A15553
 
7.6%
114761
 
7.2%
O13304
 
6.5%
S12015
 
5.9%
M11931
 
5.8%
R11563
 
5.6%
X11502
 
5.6%
T11286
 
5.5%
P10737
 
5.2%
Other values (26)73894
36.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter166354
81.1%
Decimal Number38544
 
18.8%
Dash Punctuation231
 
0.1%

Most frequent character per category

ValueCountFrequency (%)
F18583
11.2%
A15553
 
9.3%
O13304
 
8.0%
S12015
 
7.2%
M11931
 
7.2%
R11563
 
7.0%
X11502
 
6.9%
T11286
 
6.8%
P10737
 
6.5%
N6738
 
4.1%
Other values (15)43142
25.9%
ValueCountFrequency (%)
114761
38.3%
38563
22.2%
26769
17.6%
83372
 
8.7%
42961
 
7.7%
91663
 
4.3%
5287
 
0.7%
795
 
0.2%
641
 
0.1%
032
 
0.1%
ValueCountFrequency (%)
-231
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin166354
81.1%
Common38775
 
18.9%

Most frequent character per script

ValueCountFrequency (%)
F18583
11.2%
A15553
 
9.3%
O13304
 
8.0%
S12015
 
7.2%
M11931
 
7.2%
R11563
 
7.0%
X11502
 
6.9%
T11286
 
6.8%
P10737
 
6.5%
N6738
 
4.1%
Other values (15)43142
25.9%
ValueCountFrequency (%)
114761
38.1%
38563
22.1%
26769
17.5%
83372
 
8.7%
42961
 
7.6%
91663
 
4.3%
5287
 
0.7%
-231
 
0.6%
795
 
0.2%
641
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII205129
100.0%

Most frequent character per block

ValueCountFrequency (%)
F18583
 
9.1%
A15553
 
7.6%
114761
 
7.2%
O13304
 
6.5%
S12015
 
5.9%
M11931
 
5.8%
R11563
 
5.6%
X11502
 
5.6%
T11286
 
5.5%
P10737
 
5.2%
Other values (26)73894
36.0%

Gene 29
Categorical

HIGH CARDINALITY
MISSING

Distinct188
Distinct (%)0.4%
Missing36112
Missing (%)44.3%
Memory size3.8 MiB
PRDM1
 
2868
MYOD1
 
2659
GATA2
 
2525
MSX1
 
2463
SPI1
 
2214
Other values (183)
32647 

Length

Max length7
Median length5
Mean length4.50368036
Min length2

Characters and Unicode

Total characters204359
Distinct characters36
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique23 ?
Unique (%)0.1%

Sample

1st rowZNF423
2nd rowZNF423
3rd rowFEV
4th rowRORC
5th rowHIF1A
ValueCountFrequency (%)
PRDM12868
 
3.5%
MYOD12659
 
3.3%
GATA22525
 
3.1%
MSX12463
 
3.0%
SPI12214
 
2.7%
FOS2087
 
2.6%
IRF81977
 
2.4%
STAT31818
 
2.2%
PAX31736
 
2.1%
JUN1708
 
2.1%
Other values (178)23321
28.6%
(Missing)36112
44.3%
Histogram of lengths of the category
ValueCountFrequency (%)
prdm12868
 
6.3%
myod12659
 
5.9%
gata22525
 
5.6%
msx12463
 
5.4%
spi12214
 
4.9%
fos2087
 
4.6%
irf81977
 
4.4%
stat31818
 
4.0%
pax31736
 
3.8%
jun1708
 
3.8%
Other values (178)23321
51.4%

Most occurring characters

ValueCountFrequency (%)
118411
 
9.0%
A15375
 
7.5%
O14057
 
6.9%
F13787
 
6.7%
R13275
 
6.5%
M13252
 
6.5%
P12220
 
6.0%
S11967
 
5.9%
T10462
 
5.1%
X9841
 
4.8%
Other values (26)71712
35.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter168049
82.2%
Decimal Number36185
 
17.7%
Dash Punctuation125
 
0.1%

Most frequent character per category

ValueCountFrequency (%)
A15375
 
9.1%
O14057
 
8.4%
F13787
 
8.2%
R13275
 
7.9%
M13252
 
7.9%
P12220
 
7.3%
S11967
 
7.1%
T10462
 
6.2%
X9841
 
5.9%
D7252
 
4.3%
Other values (15)46561
27.7%
ValueCountFrequency (%)
118411
50.9%
36843
 
18.9%
26415
 
17.7%
81997
 
5.5%
41700
 
4.7%
9460
 
1.3%
5198
 
0.5%
781
 
0.2%
668
 
0.2%
012
 
< 0.1%
ValueCountFrequency (%)
-125
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin168049
82.2%
Common36310
 
17.8%

Most frequent character per script

ValueCountFrequency (%)
A15375
 
9.1%
O14057
 
8.4%
F13787
 
8.2%
R13275
 
7.9%
M13252
 
7.9%
P12220
 
7.3%
S11967
 
7.1%
T10462
 
6.2%
X9841
 
5.9%
D7252
 
4.3%
Other values (15)46561
27.7%
ValueCountFrequency (%)
118411
50.7%
36843
 
18.8%
26415
 
17.7%
81997
 
5.5%
41700
 
4.7%
9460
 
1.3%
5198
 
0.5%
-125
 
0.3%
781
 
0.2%
668
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII204359
100.0%

Most frequent character per block

ValueCountFrequency (%)
118411
 
9.0%
A15375
 
7.5%
O14057
 
6.9%
F13787
 
6.7%
R13275
 
6.5%
M13252
 
6.5%
P12220
 
6.0%
S11967
 
5.9%
T10462
 
5.1%
X9841
 
4.8%
Other values (26)71712
35.1%

Gene 30
Categorical

HIGH CARDINALITY
MISSING

Distinct185
Distinct (%)0.4%
Missing37671
Missing (%)46.2%
Memory size3.7 MiB
MSX1
 
2775
FOXP3
 
2626
SPI1
 
2524
MYOG
 
2512
PAX3
 
2463
Other values (180)
30917 

Length

Max length7
Median length4
Mean length4.445215327
Min length2

Characters and Unicode

Total characters194776
Distinct characters36
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique17 ?
Unique (%)< 0.1%

Sample

1st rowCEBPD
2nd rowBATF
3rd rowPAX7
4th rowALX1
5th rowCEBPD
ValueCountFrequency (%)
MSX12775
 
3.4%
FOXP32626
 
3.2%
SPI12524
 
3.1%
MYOG2512
 
3.1%
PAX32463
 
3.0%
MYOD12226
 
2.7%
FOS2217
 
2.7%
PRDM12030
 
2.5%
KLF41996
 
2.4%
IRF81575
 
1.9%
Other values (175)20873
25.6%
(Missing)37671
46.2%
Histogram of lengths of the category
ValueCountFrequency (%)
msx12775
 
6.3%
foxp32626
 
6.0%
spi12524
 
5.8%
myog2512
 
5.7%
pax32463
 
5.6%
myod12226
 
5.1%
fos2217
 
5.1%
prdm12030
 
4.6%
klf41996
 
4.6%
irf81575
 
3.6%
Other values (175)20873
47.6%

Most occurring characters

ValueCountFrequency (%)
117937
 
9.2%
O14662
 
7.5%
A14625
 
7.5%
F14440
 
7.4%
M12932
 
6.6%
P12596
 
6.5%
S11440
 
5.9%
X11140
 
5.7%
R10915
 
5.6%
T9050
 
4.6%
Other values (26)65039
33.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter160003
82.1%
Decimal Number34595
 
17.8%
Dash Punctuation178
 
0.1%

Most frequent character per category

ValueCountFrequency (%)
O14662
 
9.2%
A14625
 
9.1%
F14440
 
9.0%
M12932
 
8.1%
P12596
 
7.9%
S11440
 
7.1%
X11140
 
7.0%
R10915
 
6.8%
T9050
 
5.7%
N6656
 
4.2%
Other values (15)41547
26.0%
ValueCountFrequency (%)
117937
51.8%
37351
21.2%
24228
 
12.2%
42495
 
7.2%
81661
 
4.8%
5556
 
1.6%
9154
 
0.4%
7115
 
0.3%
682
 
0.2%
016
 
< 0.1%
ValueCountFrequency (%)
-178
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin160003
82.1%
Common34773
 
17.9%

Most frequent character per script

ValueCountFrequency (%)
O14662
 
9.2%
A14625
 
9.1%
F14440
 
9.0%
M12932
 
8.1%
P12596
 
7.9%
S11440
 
7.1%
X11140
 
7.0%
R10915
 
6.8%
T9050
 
5.7%
N6656
 
4.2%
Other values (15)41547
26.0%
ValueCountFrequency (%)
117937
51.6%
37351
21.1%
24228
 
12.2%
42495
 
7.2%
81661
 
4.8%
5556
 
1.6%
-178
 
0.5%
9154
 
0.4%
7115
 
0.3%
682
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII194776
100.0%

Most frequent character per block

ValueCountFrequency (%)
117937
 
9.2%
O14662
 
7.5%
A14625
 
7.5%
F14440
 
7.4%
M12932
 
6.6%
P12596
 
6.5%
S11440
 
5.9%
X11140
 
5.7%
R10915
 
5.6%
T9050
 
4.6%
Other values (26)65039
33.4%

Gene 31
Categorical

HIGH CARDINALITY
MISSING

Distinct194
Distinct (%)0.5%
Missing39334
Missing (%)48.3%
Memory size3.7 MiB
PAX3
 
2775
MYOD1
 
2667
IRF8
 
2626
MEF2A
 
2512
PRDM1
 
2382
Other values (189)
29192 

Length

Max length7
Median length5
Mean length4.606016036
Min length2

Characters and Unicode

Total characters194162
Distinct characters36
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique17 ?
Unique (%)< 0.1%

Sample

1st rowGRHL1
2nd rowNR4A2
3rd rowFOSL1
4th rowREST
5th rowHIF1A
ValueCountFrequency (%)
PAX32775
 
3.4%
MYOD12667
 
3.3%
IRF82626
 
3.2%
MEF2A2512
 
3.1%
PRDM12382
 
2.9%
MYOG2117
 
2.6%
KLF42028
 
2.5%
MSX11964
 
2.4%
FOXP31837
 
2.3%
SPI11439
 
1.8%
Other values (184)19807
24.3%
(Missing)39334
48.3%
Histogram of lengths of the category
ValueCountFrequency (%)
pax32775
 
6.6%
myod12667
 
6.3%
irf82626
 
6.2%
mef2a2512
 
6.0%
prdm12382
 
5.7%
myog2117
 
5.0%
klf42028
 
4.8%
msx11964
 
4.7%
foxp31837
 
4.4%
spi11439
 
3.4%
Other values (184)19807
47.0%

Most occurring characters

ValueCountFrequency (%)
117678
 
9.1%
F16051
 
8.3%
A15619
 
8.0%
M13568
 
7.0%
O12679
 
6.5%
P12133
 
6.2%
X11221
 
5.8%
T8838
 
4.6%
R8524
 
4.4%
S7618
 
3.9%
Other values (26)70233
36.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter156386
80.5%
Decimal Number36750
 
18.9%
Dash Punctuation1026
 
0.5%

Most frequent character per category

ValueCountFrequency (%)
F16051
 
10.3%
A15619
 
10.0%
M13568
 
8.7%
O12679
 
8.1%
P12133
 
7.8%
X11221
 
7.2%
T8838
 
5.7%
R8524
 
5.5%
S7618
 
4.9%
N6136
 
3.9%
Other values (15)43999
28.1%
ValueCountFrequency (%)
117678
48.1%
37456
20.3%
25277
 
14.4%
82690
 
7.3%
42426
 
6.6%
5869
 
2.4%
9143
 
0.4%
7106
 
0.3%
054
 
0.1%
651
 
0.1%
ValueCountFrequency (%)
-1026
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin156386
80.5%
Common37776
 
19.5%

Most frequent character per script

ValueCountFrequency (%)
F16051
 
10.3%
A15619
 
10.0%
M13568
 
8.7%
O12679
 
8.1%
P12133
 
7.8%
X11221
 
7.2%
T8838
 
5.7%
R8524
 
5.5%
S7618
 
4.9%
N6136
 
3.9%
Other values (15)43999
28.1%
ValueCountFrequency (%)
117678
46.8%
37456
19.7%
25277
 
14.0%
82690
 
7.1%
42426
 
6.4%
-1026
 
2.7%
5869
 
2.3%
9143
 
0.4%
7106
 
0.3%
054
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII194162
100.0%

Most frequent character per block

ValueCountFrequency (%)
117678
 
9.1%
F16051
 
8.3%
A15619
 
8.0%
M13568
 
7.0%
O12679
 
6.5%
P12133
 
6.2%
X11221
 
5.8%
T8838
 
4.6%
R8524
 
4.4%
S7618
 
3.9%
Other values (26)70233
36.2%

Gene 32
Categorical

HIGH CARDINALITY
MISSING

Distinct204
Distinct (%)0.5%
Missing40821
Missing (%)50.1%
Memory size3.6 MiB
MYOD1
 
2509
MYOG
 
2505
GATA2
 
2497
JUN
 
2452
MSX1
 
2282
Other values (199)
28422 

Length

Max length7
Median length5
Mean length4.503823739
Min length2

Characters and Unicode

Total characters183157
Distinct characters36
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9 ?
Unique (%)< 0.1%

Sample

1st rowSIX6
2nd rowARID3A
3rd rowBARX2
4th rowSOX6
5th rowGRHL1
ValueCountFrequency (%)
MYOD12509
 
3.1%
MYOG2505
 
3.1%
GATA22497
 
3.1%
JUN2452
 
3.0%
MSX12282
 
2.8%
MEF2A2117
 
2.6%
PAX31964
 
2.4%
IRF81653
 
2.0%
CEBPB1526
 
1.9%
KLF41358
 
1.7%
Other values (194)19804
24.3%
(Missing)40821
50.1%
Histogram of lengths of the category
ValueCountFrequency (%)
myod12509
 
6.2%
myog2505
 
6.2%
gata22497
 
6.1%
jun2452
 
6.0%
msx12282
 
5.6%
mef2a2117
 
5.2%
pax31964
 
4.8%
irf81653
 
4.1%
cebpb1526
 
3.8%
klf41358
 
3.3%
Other values (194)19804
48.7%

Most occurring characters

ValueCountFrequency (%)
A15042
 
8.2%
114463
 
7.9%
F13583
 
7.4%
O13258
 
7.2%
M12795
 
7.0%
X10036
 
5.5%
P9851
 
5.4%
T8288
 
4.5%
S8241
 
4.5%
N7249
 
4.0%
Other values (26)70351
38.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter148956
81.3%
Decimal Number33044
 
18.0%
Dash Punctuation1157
 
0.6%

Most frequent character per category

ValueCountFrequency (%)
A15042
 
10.1%
F13583
 
9.1%
O13258
 
8.9%
M12795
 
8.6%
X10036
 
6.7%
P9851
 
6.6%
T8288
 
5.6%
S8241
 
5.5%
N7249
 
4.9%
G6147
 
4.1%
Other values (15)44466
29.9%
ValueCountFrequency (%)
114463
43.8%
36919
20.9%
26277
19.0%
41869
 
5.7%
81695
 
5.1%
51353
 
4.1%
6160
 
0.5%
9128
 
0.4%
7114
 
0.3%
066
 
0.2%
ValueCountFrequency (%)
-1157
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin148956
81.3%
Common34201
 
18.7%

Most frequent character per script

ValueCountFrequency (%)
A15042
 
10.1%
F13583
 
9.1%
O13258
 
8.9%
M12795
 
8.6%
X10036
 
6.7%
P9851
 
6.6%
T8288
 
5.6%
S8241
 
5.5%
N7249
 
4.9%
G6147
 
4.1%
Other values (15)44466
29.9%
ValueCountFrequency (%)
114463
42.3%
36919
20.2%
26277
18.4%
41869
 
5.5%
81695
 
5.0%
51353
 
4.0%
-1157
 
3.4%
6160
 
0.5%
9128
 
0.4%
7114
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII183157
100.0%

Most frequent character per block

ValueCountFrequency (%)
A15042
 
8.2%
114463
 
7.9%
F13583
 
7.4%
O13258
 
7.2%
M12795
 
7.0%
X10036
 
5.5%
P9851
 
5.4%
T8288
 
4.5%
S8241
 
4.5%
N7249
 
4.0%
Other values (26)70351
38.4%

Gene 33
Categorical

HIGH CARDINALITY
MISSING

Distinct209
Distinct (%)0.5%
Missing42331
Missing (%)51.9%
Memory size3.6 MiB
MEF2A
 
2505
SPI1
 
2497
MYOG
 
2346
PAX3
 
2282
FOXP3
 
2087
Other values (204)
27440 

Length

Max length7
Median length5
Mean length4.471180121
Min length2

Characters and Unicode

Total characters175078
Distinct characters36
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6 ?
Unique (%)< 0.1%

Sample

1st rowGLI1
2nd rowPDX1
3rd rowNFIA
4th rowSATB2
5th rowPDX1
ValueCountFrequency (%)
MEF2A2505
 
3.1%
SPI12497
 
3.1%
MYOG2346
 
2.9%
PAX32282
 
2.8%
FOXP32087
 
2.6%
JUN2014
 
2.5%
MYOD11744
 
2.1%
FOS1733
 
2.1%
GATA21614
 
2.0%
PRDM11438
 
1.8%
Other values (199)18897
23.2%
(Missing)42331
51.9%
Histogram of lengths of the category
ValueCountFrequency (%)
mef2a2505
 
6.4%
spi12497
 
6.4%
myog2346
 
6.0%
pax32282
 
5.8%
foxp32087
 
5.3%
jun2014
 
5.1%
myod11744
 
4.5%
fos1733
 
4.4%
gata21614
 
4.1%
prdm11438
 
3.7%
Other values (199)18897
48.3%

Most occurring characters

ValueCountFrequency (%)
F15282
 
8.7%
O14594
 
8.3%
112888
 
7.4%
A12648
 
7.2%
P11279
 
6.4%
M10814
 
6.2%
X9721
 
5.6%
S9184
 
5.2%
37801
 
4.5%
T6395
 
3.7%
Other values (26)64472
36.8%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter142687
81.5%
Decimal Number31974
 
18.3%
Dash Punctuation417
 
0.2%

Most frequent character per category

ValueCountFrequency (%)
F15282
 
10.7%
O14594
 
10.2%
A12648
 
8.9%
P11279
 
7.9%
M10814
 
7.6%
X9721
 
6.8%
S9184
 
6.4%
T6395
 
4.5%
E5987
 
4.2%
N5688
 
4.0%
Other values (15)41095
28.8%
ValueCountFrequency (%)
112888
40.3%
37801
24.4%
26062
19.0%
51926
 
6.0%
41871
 
5.9%
8908
 
2.8%
6216
 
0.7%
7180
 
0.6%
998
 
0.3%
024
 
0.1%
ValueCountFrequency (%)
-417
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin142687
81.5%
Common32391
 
18.5%

Most frequent character per script

ValueCountFrequency (%)
F15282
 
10.7%
O14594
 
10.2%
A12648
 
8.9%
P11279
 
7.9%
M10814
 
7.6%
X9721
 
6.8%
S9184
 
6.4%
T6395
 
4.5%
E5987
 
4.2%
N5688
 
4.0%
Other values (15)41095
28.8%
ValueCountFrequency (%)
112888
39.8%
37801
24.1%
26062
18.7%
51926
 
5.9%
41871
 
5.8%
8908
 
2.8%
-417
 
1.3%
6216
 
0.7%
7180
 
0.6%
998
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII175078
100.0%

Most frequent character per block

ValueCountFrequency (%)
F15282
 
8.7%
O14594
 
8.3%
112888
 
7.4%
A12648
 
7.2%
P11279
 
6.4%
M10814
 
6.2%
X9721
 
5.6%
S9184
 
5.2%
37801
 
4.5%
T6395
 
3.7%
Other values (26)64472
36.8%

Gene 34
Categorical

HIGH CARDINALITY
MISSING

Distinct212
Distinct (%)0.6%
Missing44048
Missing (%)54.1%
Memory size3.5 MiB
FOXP3
 
2527
JUN
 
2454
MYOD1
 
2360
MEF2A
 
2346
KLF4
 
2283
Other values (207)
25470 

Length

Max length7
Median length5
Mean length4.406730769
Min length2

Characters and Unicode

Total characters164988
Distinct characters36
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique12 ?
Unique (%)< 0.1%

Sample

1st rowHIF1A
2nd rowPPARD
3rd rowHES5
4th rowZNF423
5th rowSOX10
ValueCountFrequency (%)
FOXP32527
 
3.1%
JUN2454
 
3.0%
MYOD12360
 
2.9%
MEF2A2346
 
2.9%
KLF42283
 
2.8%
PRDM11678
 
2.1%
MYOG1650
 
2.0%
SPI11614
 
2.0%
KLF51562
 
1.9%
FOS1521
 
1.9%
Other values (202)17445
 
21.4%
(Missing)44048
54.1%
Histogram of lengths of the category
ValueCountFrequency (%)
foxp32527
 
6.7%
jun2454
 
6.6%
myod12360
 
6.3%
mef2a2346
 
6.3%
klf42283
 
6.1%
prdm11678
 
4.5%
myog1650
 
4.4%
spi11614
 
4.3%
klf51562
 
4.2%
fos1521
 
4.1%
Other values (202)17445
46.6%

Most occurring characters

ValueCountFrequency (%)
F15748
 
9.5%
O13530
 
8.2%
112013
 
7.3%
A11827
 
7.2%
M10554
 
6.4%
P9072
 
5.5%
X8952
 
5.4%
S7546
 
4.6%
36050
 
3.7%
R5699
 
3.5%
Other values (26)63997
38.8%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter134768
81.7%
Decimal Number29930
 
18.1%
Dash Punctuation290
 
0.2%

Most frequent character per category

ValueCountFrequency (%)
F15748
 
11.7%
O13530
 
10.0%
A11827
 
8.8%
M10554
 
7.8%
P9072
 
6.7%
X8952
 
6.6%
S7546
 
5.6%
R5699
 
4.2%
N5343
 
4.0%
T4832
 
3.6%
Other values (15)41665
30.9%
ValueCountFrequency (%)
112013
40.1%
36050
20.2%
25144
17.2%
43065
 
10.2%
52219
 
7.4%
81165
 
3.9%
6120
 
0.4%
795
 
0.3%
031
 
0.1%
928
 
0.1%
ValueCountFrequency (%)
-290
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin134768
81.7%
Common30220
 
18.3%

Most frequent character per script

ValueCountFrequency (%)
F15748
 
11.7%
O13530
 
10.0%
A11827
 
8.8%
M10554
 
7.8%
P9072
 
6.7%
X8952
 
6.6%
S7546
 
5.6%
R5699
 
4.2%
N5343
 
4.0%
T4832
 
3.6%
Other values (15)41665
30.9%
ValueCountFrequency (%)
112013
39.8%
36050
20.0%
25144
17.0%
43065
 
10.1%
52219
 
7.3%
81165
 
3.9%
-290
 
1.0%
6120
 
0.4%
795
 
0.3%
031
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII164988
100.0%

Most frequent character per block

ValueCountFrequency (%)
F15748
 
9.5%
O13530
 
8.2%
112013
 
7.3%
A11827
 
7.2%
M10554
 
6.4%
P9072
 
5.5%
X8952
 
5.4%
S7546
 
4.6%
36050
 
3.7%
R5699
 
3.5%
Other values (26)63997
38.8%

Gene 35
Categorical

HIGH CARDINALITY
MISSING

Distinct212
Distinct (%)0.6%
Missing45558
Missing (%)55.9%
Memory size3.5 MiB
JUN
 
2298
MYOG
 
2187
KLF5
 
2135
GATA3
 
2007
CEBPB
 
1918
Other values (207)
25385 

Length

Max length7
Median length4
Mean length4.406790982
Min length2

Characters and Unicode

Total characters158336
Distinct characters36
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique13 ?
Unique (%)< 0.1%

Sample

1st rowSKIL
2nd rowHEY2
3rd rowSIX6
4th rowPHOX2A
5th rowFOSL1
ValueCountFrequency (%)
JUN2298
 
2.8%
MYOG2187
 
2.7%
KLF52135
 
2.6%
GATA32007
 
2.5%
CEBPB1918
 
2.4%
MEF2A1650
 
2.0%
MSX11595
 
2.0%
KLF41518
 
1.9%
PRDM11506
 
1.8%
SOX21474
 
1.8%
Other values (202)17642
 
21.6%
(Missing)45558
55.9%
Histogram of lengths of the category
ValueCountFrequency (%)
jun2298
 
6.4%
myog2187
 
6.1%
klf52135
 
5.9%
gata32007
 
5.6%
cebpb1918
 
5.3%
mef2a1650
 
4.6%
msx11595
 
4.4%
klf41518
 
4.2%
prdm11506
 
4.2%
sox21474
 
4.1%
Other values (202)17642
49.1%

Most occurring characters

ValueCountFrequency (%)
A13613
 
8.6%
F12201
 
7.7%
110541
 
6.7%
O9976
 
6.3%
M8862
 
5.6%
X8080
 
5.1%
S7934
 
5.0%
P7643
 
4.8%
T6641
 
4.2%
36199
 
3.9%
Other values (26)66646
42.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter129870
82.0%
Decimal Number28024
 
17.7%
Dash Punctuation442
 
0.3%

Most frequent character per category

ValueCountFrequency (%)
A13613
 
10.5%
F12201
 
9.4%
O9976
 
7.7%
M8862
 
6.8%
X8080
 
6.2%
S7934
 
6.1%
P7643
 
5.9%
T6641
 
5.1%
E5844
 
4.5%
G5691
 
4.4%
Other values (15)43385
33.4%
ValueCountFrequency (%)
110541
37.6%
36199
22.1%
25369
19.2%
42546
 
9.1%
52332
 
8.3%
8739
 
2.6%
7125
 
0.4%
6113
 
0.4%
056
 
0.2%
94
 
< 0.1%
ValueCountFrequency (%)
-442
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin129870
82.0%
Common28466
 
18.0%

Most frequent character per script

ValueCountFrequency (%)
A13613
 
10.5%
F12201
 
9.4%
O9976
 
7.7%
M8862
 
6.8%
X8080
 
6.2%
S7934
 
6.1%
P7643
 
5.9%
T6641
 
5.1%
E5844
 
4.5%
G5691
 
4.4%
Other values (15)43385
33.4%
ValueCountFrequency (%)
110541
37.0%
36199
21.8%
25369
18.9%
42546
 
8.9%
52332
 
8.2%
8739
 
2.6%
-442
 
1.6%
7125
 
0.4%
6113
 
0.4%
056
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII158336
100.0%

Most frequent character per block

ValueCountFrequency (%)
A13613
 
8.6%
F12201
 
7.7%
110541
 
6.7%
O9976
 
6.3%
M8862
 
5.6%
X8080
 
5.1%
S7934
 
5.0%
P7643
 
4.8%
T6641
 
4.2%
36199
 
3.9%
Other values (26)66646
42.1%

Gene 36
Categorical

HIGH CARDINALITY
MISSING

Distinct197
Distinct (%)0.6%
Missing47060
Missing (%)57.8%
Memory size3.5 MiB
KLF5
3174 
MEF2A
 
2186
SOX2
 
2086
PAX3
 
1804
JUN
 
1650
Other values (192)
23528 

Length

Max length7
Median length4
Mean length4.441501104
Min length2

Characters and Unicode

Total characters152912
Distinct characters35
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique12 ?
Unique (%)< 0.1%

Sample

1st rowHES6
2nd rowTFEB
3rd rowFOXK2
4th rowGFI1
5th rowMYF5
ValueCountFrequency (%)
KLF53174
 
3.9%
MEF2A2186
 
2.7%
SOX22086
 
2.6%
PAX31804
 
2.2%
JUN1650
 
2.0%
MYOD11519
 
1.9%
MSX11444
 
1.8%
CEBPB1356
 
1.7%
KLF41277
 
1.6%
STAT31247
 
1.5%
Other values (187)16685
 
20.5%
(Missing)47060
57.8%
Histogram of lengths of the category
ValueCountFrequency (%)
klf53174
 
9.2%
mef2a2186
 
6.3%
sox22086
 
6.1%
pax31804
 
5.2%
jun1650
 
4.8%
myod11519
 
4.4%
msx11444
 
4.2%
cebpb1356
 
3.9%
klf41277
 
3.7%
stat31247
 
3.6%
Other values (187)16685
48.5%

Most occurring characters

ValueCountFrequency (%)
F11742
 
7.7%
111391
 
7.4%
A11334
 
7.4%
S8833
 
5.8%
X8823
 
5.8%
O8552
 
5.6%
M8049
 
5.3%
36969
 
4.6%
P6672
 
4.4%
K6000
 
3.9%
Other values (25)64547
42.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter121360
79.4%
Decimal Number30370
 
19.9%
Dash Punctuation1182
 
0.8%

Most frequent character per category

ValueCountFrequency (%)
F11742
 
9.7%
A11334
 
9.3%
S8833
 
7.3%
X8823
 
7.3%
O8552
 
7.0%
M8049
 
6.6%
P6672
 
5.5%
K6000
 
4.9%
E5908
 
4.9%
N5871
 
4.8%
Other values (15)39576
32.6%
ValueCountFrequency (%)
111391
37.5%
36969
22.9%
25974
19.7%
53407
 
11.2%
42170
 
7.1%
7178
 
0.6%
6127
 
0.4%
891
 
0.3%
063
 
0.2%
ValueCountFrequency (%)
-1182
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin121360
79.4%
Common31552
 
20.6%

Most frequent character per script

ValueCountFrequency (%)
F11742
 
9.7%
A11334
 
9.3%
S8833
 
7.3%
X8823
 
7.3%
O8552
 
7.0%
M8049
 
6.6%
P6672
 
5.5%
K6000
 
4.9%
E5908
 
4.9%
N5871
 
4.8%
Other values (15)39576
32.6%
ValueCountFrequency (%)
111391
36.1%
36969
22.1%
25974
18.9%
53407
 
10.8%
42170
 
6.9%
-1182
 
3.7%
7178
 
0.6%
6127
 
0.4%
891
 
0.3%
063
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII152912
100.0%

Most frequent character per block

ValueCountFrequency (%)
F11742
 
7.7%
111391
 
7.4%
A11334
 
7.4%
S8833
 
5.8%
X8823
 
5.8%
O8552
 
5.6%
M8049
 
5.3%
36969
 
4.6%
P6672
 
4.4%
K6000
 
3.9%
Other values (25)64547
42.2%

Gene 37
Categorical

HIGH CARDINALITY
MISSING

Distinct189
Distinct (%)0.6%
Missing48829
Missing (%)59.9%
Memory size3.4 MiB
SOX2
2980 
JUN
 
2149
KLF5
 
2071
MYOD1
 
1863
PAX3
 
1560
Other values (184)
22036 

Length

Max length7
Median length4
Mean length4.415658777
Min length2

Characters and Unicode

Total characters144211
Distinct characters36
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6 ?
Unique (%)< 0.1%

Sample

1st rowZIC3
2nd rowNFE2L2
3rd rowCEBPD
4th rowMXI1
5th rowGFI1
ValueCountFrequency (%)
SOX22980
 
3.7%
JUN2149
 
2.6%
KLF52071
 
2.5%
MYOD11863
 
2.3%
PAX31560
 
1.9%
MYOG1433
 
1.8%
KLF41355
 
1.7%
STAT31269
 
1.6%
FOXO11209
 
1.5%
NR3C11010
 
1.2%
Other values (179)15760
 
19.3%
(Missing)48829
59.9%
Histogram of lengths of the category
ValueCountFrequency (%)
sox22980
 
9.1%
jun2149
 
6.6%
klf52071
 
6.3%
myod11863
 
5.7%
pax31560
 
4.8%
myog1433
 
4.4%
klf41355
 
4.1%
stat31269
 
3.9%
foxo11209
 
3.7%
nr3c11010
 
3.1%
Other values (179)15760
48.3%

Most occurring characters

ValueCountFrequency (%)
O11225
 
7.8%
19579
 
6.6%
A9474
 
6.6%
F9448
 
6.6%
X8902
 
6.2%
S8139
 
5.6%
M6950
 
4.8%
N6536
 
4.5%
36526
 
4.5%
R5657
 
3.9%
Other values (26)61775
42.8%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter116613
80.9%
Decimal Number26876
 
18.6%
Dash Punctuation722
 
0.5%

Most frequent character per category

ValueCountFrequency (%)
O11225
 
9.6%
A9474
 
8.1%
F9448
 
8.1%
X8902
 
7.6%
S8139
 
7.0%
M6950
 
6.0%
N6536
 
5.6%
R5657
 
4.9%
T5612
 
4.8%
P5352
 
4.6%
Other values (15)39318
33.7%
ValueCountFrequency (%)
19579
35.6%
36526
24.3%
25120
19.1%
52896
 
10.8%
42233
 
8.3%
6173
 
0.6%
8163
 
0.6%
7134
 
0.5%
050
 
0.2%
92
 
< 0.1%
ValueCountFrequency (%)
-722
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin116613
80.9%
Common27598
 
19.1%

Most frequent character per script

ValueCountFrequency (%)
O11225
 
9.6%
A9474
 
8.1%
F9448
 
8.1%
X8902
 
7.6%
S8139
 
7.0%
M6950
 
6.0%
N6536
 
5.6%
R5657
 
4.9%
T5612
 
4.8%
P5352
 
4.6%
Other values (15)39318
33.7%
ValueCountFrequency (%)
19579
34.7%
36526
23.6%
25120
18.6%
52896
 
10.5%
42233
 
8.1%
-722
 
2.6%
6173
 
0.6%
8163
 
0.6%
7134
 
0.5%
050
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII144211
100.0%

Most frequent character per block

ValueCountFrequency (%)
O11225
 
7.8%
19579
 
6.6%
A9474
 
6.6%
F9448
 
6.6%
X8902
 
6.2%
S8139
 
5.6%
M6950
 
4.8%
N6536
 
4.5%
36526
 
4.5%
R5657
 
3.9%
Other values (26)61775
42.8%

Gene 38
Categorical

HIGH CARDINALITY
MISSING

Distinct183
Distinct (%)0.6%
Missing50642
Missing (%)62.1%
Memory size3.4 MiB
STAT3
 
2107
SOX2
 
1954
KLF5
 
1915
MYOG
 
1705
MYOD1
 
1468
Other values (178)
21697 

Length

Max length7
Median length5
Mean length4.508266874
Min length2

Characters and Unicode

Total characters139062
Distinct characters35
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7 ?
Unique (%)< 0.1%

Sample

1st rowRELB
2nd rowKLF10
3rd rowNR4A1
4th rowTFEB
5th rowCEBPD
ValueCountFrequency (%)
STAT32107
 
2.6%
SOX21954
 
2.4%
KLF51915
 
2.4%
MYOG1705
 
2.1%
MYOD11468
 
1.8%
MEF2A1433
 
1.8%
GATA31163
 
1.4%
FOXP3935
 
1.1%
RELA930
 
1.1%
PAX3883
 
1.1%
Other values (173)16353
 
20.1%
(Missing)50642
62.1%
Histogram of lengths of the category
ValueCountFrequency (%)
stat32107
 
6.8%
sox21954
 
6.3%
klf51915
 
6.2%
myog1705
 
5.5%
myod11468
 
4.8%
mef2a1433
 
4.6%
gata31163
 
3.8%
foxp3935
 
3.0%
rela930
 
3.0%
pax3883
 
2.9%
Other values (173)16353
53.0%

Most occurring characters

ValueCountFrequency (%)
A12115
 
8.7%
F10735
 
7.7%
O9950
 
7.2%
T8210
 
5.9%
S7631
 
5.5%
17351
 
5.3%
X6706
 
4.8%
36587
 
4.7%
R6332
 
4.6%
M6331
 
4.6%
Other values (25)57114
41.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter114446
82.3%
Decimal Number24241
 
17.4%
Dash Punctuation375
 
0.3%

Most frequent character per category

ValueCountFrequency (%)
A12115
 
10.6%
F10735
 
9.4%
O9950
 
8.7%
T8210
 
7.2%
S7631
 
6.7%
X6706
 
5.9%
R6332
 
5.5%
M6331
 
5.5%
E6315
 
5.5%
L5673
 
5.0%
Other values (15)34448
30.1%
ValueCountFrequency (%)
17351
30.3%
36587
27.2%
25048
20.8%
52665
 
11.0%
41806
 
7.5%
8320
 
1.3%
6225
 
0.9%
0145
 
0.6%
794
 
0.4%
ValueCountFrequency (%)
-375
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin114446
82.3%
Common24616
 
17.7%

Most frequent character per script

ValueCountFrequency (%)
A12115
 
10.6%
F10735
 
9.4%
O9950
 
8.7%
T8210
 
7.2%
S7631
 
6.7%
X6706
 
5.9%
R6332
 
5.5%
M6331
 
5.5%
E6315
 
5.5%
L5673
 
5.0%
Other values (15)34448
30.1%
ValueCountFrequency (%)
17351
29.9%
36587
26.8%
25048
20.5%
52665
 
10.8%
41806
 
7.3%
-375
 
1.5%
8320
 
1.3%
6225
 
0.9%
0145
 
0.6%
794
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII139062
100.0%

Most frequent character per block

ValueCountFrequency (%)
A12115
 
8.7%
F10735
 
7.7%
O9950
 
7.2%
T8210
 
5.9%
S7631
 
5.5%
17351
 
5.3%
X6706
 
4.8%
36587
 
4.7%
R6332
 
4.6%
M6331
 
4.6%
Other values (25)57114
41.1%

Gene 39
Categorical

HIGH CARDINALITY
MISSING

Distinct179
Distinct (%)0.6%
Missing52723
Missing (%)64.7%
Memory size3.3 MiB
MEF2A
 
1704
STAT3
 
1575
SOX2
 
1511
GATA3
 
1463
JUN
 
1403
Other values (174)
21109 

Length

Max length7
Median length5
Mean length4.483921432
Min length2

Characters and Unicode

Total characters128980
Distinct characters35
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6 ?
Unique (%)< 0.1%

Sample

1st rowTBX5
2nd rowSIX6
3rd rowBHLHE41
4th rowDBP
5th rowFOXE3
ValueCountFrequency (%)
MEF2A1704
 
2.1%
STAT31575
 
1.9%
SOX21511
 
1.9%
GATA31463
 
1.8%
JUN1403
 
1.7%
MYOG1361
 
1.7%
NFATC11176
 
1.4%
RORA1175
 
1.4%
KLF5986
 
1.2%
MYOD1887
 
1.1%
Other values (169)15524
 
19.1%
(Missing)52723
64.7%
Histogram of lengths of the category
ValueCountFrequency (%)
mef2a1704
 
5.9%
stat31575
 
5.5%
sox21511
 
5.3%
gata31463
 
5.1%
jun1403
 
4.9%
myog1361
 
4.7%
nfatc11176
 
4.1%
rora1175
 
4.1%
klf5986
 
3.4%
myod1887
 
3.1%
Other values (169)15524
54.0%

Most occurring characters

ValueCountFrequency (%)
A13770
 
10.7%
F10124
 
7.8%
O8252
 
6.4%
T8009
 
6.2%
R8004
 
6.2%
16429
 
5.0%
S6402
 
5.0%
26063
 
4.7%
E6061
 
4.7%
N6041
 
4.7%
Other values (25)49825
38.6%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter107205
83.1%
Decimal Number21476
 
16.7%
Dash Punctuation299
 
0.2%

Most frequent character per category

ValueCountFrequency (%)
A13770
12.8%
F10124
 
9.4%
O8252
 
7.7%
T8009
 
7.5%
R8004
 
7.5%
S6402
 
6.0%
E6061
 
5.7%
N6041
 
5.6%
M5634
 
5.3%
X4424
 
4.1%
Other values (15)30484
28.4%
ValueCountFrequency (%)
16429
29.9%
26063
28.2%
35329
24.8%
51448
 
6.7%
41408
 
6.6%
6242
 
1.1%
8235
 
1.1%
0167
 
0.8%
7155
 
0.7%
ValueCountFrequency (%)
-299
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin107205
83.1%
Common21775
 
16.9%

Most frequent character per script

ValueCountFrequency (%)
A13770
12.8%
F10124
 
9.4%
O8252
 
7.7%
T8009
 
7.5%
R8004
 
7.5%
S6402
 
6.0%
E6061
 
5.7%
N6041
 
5.6%
M5634
 
5.3%
X4424
 
4.1%
Other values (15)30484
28.4%
ValueCountFrequency (%)
16429
29.5%
26063
27.8%
35329
24.5%
51448
 
6.6%
41408
 
6.5%
-299
 
1.4%
6242
 
1.1%
8235
 
1.1%
0167
 
0.8%
7155
 
0.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII128980
100.0%

Most frequent character per block

ValueCountFrequency (%)
A13770
 
10.7%
F10124
 
7.8%
O8252
 
6.4%
T8009
 
6.2%
R8004
 
6.2%
16429
 
5.0%
S6402
 
5.0%
26063
 
4.7%
E6061
 
4.7%
N6041
 
4.7%
Other values (25)49825
38.6%

Gene 40
Categorical

HIGH CARDINALITY
MISSING

Distinct178
Distinct (%)0.7%
Missing54995
Missing (%)67.5%
Memory size3.2 MiB
JUN
 
1541
SOX2
 
1450
MEF2A
 
1361
NFATC1
 
1107
KLF5
 
1010
Other values (173)
20024 

Length

Max length7
Median length5
Mean length4.522062432
Min length2

Characters and Unicode

Total characters119803
Distinct characters35
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)< 0.1%

Sample

1st rowJUNB
2nd rowZNF281
3rd rowSIX2
4th rowFOXA3
5th rowMYF5
ValueCountFrequency (%)
JUN1541
 
1.9%
SOX21450
 
1.8%
MEF2A1361
 
1.7%
NFATC11107
 
1.4%
KLF51010
 
1.2%
ESR1939
 
1.2%
ARNTL915
 
1.1%
NR3C1895
 
1.1%
RORA823
 
1.0%
MYOG821
 
1.0%
Other values (168)15631
 
19.2%
(Missing)54995
67.5%
Histogram of lengths of the category
ValueCountFrequency (%)
jun1541
 
5.8%
sox21450
 
5.5%
mef2a1361
 
5.1%
nfatc11107
 
4.2%
klf51010
 
3.8%
esr1939
 
3.5%
arntl915
 
3.5%
nr3c1895
 
3.4%
rora823
 
3.1%
myog821
 
3.1%
Other values (168)15631
59.0%

Most occurring characters

ValueCountFrequency (%)
A11541
 
9.6%
F9274
 
7.7%
N7730
 
6.5%
R7553
 
6.3%
17135
 
6.0%
S6723
 
5.6%
26237
 
5.2%
O6187
 
5.2%
T6144
 
5.1%
E5616
 
4.7%
Other values (25)45663
38.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter97878
81.7%
Decimal Number21490
 
17.9%
Dash Punctuation435
 
0.4%

Most frequent character per category

ValueCountFrequency (%)
A11541
11.8%
F9274
 
9.5%
N7730
 
7.9%
R7553
 
7.7%
S6723
 
6.9%
O6187
 
6.3%
T6144
 
6.3%
E5616
 
5.7%
C4498
 
4.6%
M4354
 
4.4%
Other values (15)28258
28.9%
ValueCountFrequency (%)
17135
33.2%
26237
29.0%
33744
17.4%
51797
 
8.4%
41738
 
8.1%
6321
 
1.5%
8223
 
1.0%
0153
 
0.7%
7142
 
0.7%
ValueCountFrequency (%)
-435
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin97878
81.7%
Common21925
 
18.3%

Most frequent character per script

ValueCountFrequency (%)
A11541
11.8%
F9274
 
9.5%
N7730
 
7.9%
R7553
 
7.7%
S6723
 
6.9%
O6187
 
6.3%
T6144
 
6.3%
E5616
 
5.7%
C4498
 
4.6%
M4354
 
4.4%
Other values (15)28258
28.9%
ValueCountFrequency (%)
17135
32.5%
26237
28.4%
33744
17.1%
51797
 
8.2%
41738
 
7.9%
-435
 
2.0%
6321
 
1.5%
8223
 
1.0%
0153
 
0.7%
7142
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII119803
100.0%

Most frequent character per block

ValueCountFrequency (%)
A11541
 
9.6%
F9274
 
7.7%
N7730
 
6.5%
R7553
 
6.3%
17135
 
6.0%
S6723
 
5.6%
26237
 
5.2%
O6187
 
5.2%
T6144
 
5.1%
E5616
 
4.7%
Other values (25)45663
38.1%

Gene 41
Categorical

HIGH CARDINALITY
MISSING

Distinct174
Distinct (%)0.7%
Missing57565
Missing (%)70.6%
Memory size3.2 MiB
NFATC1
 
1583
SOX2
 
1387
KLF5
 
1374
JUN
 
1268
STAT3
 
1058
Other values (169)
17253 

Length

Max length7
Median length5
Mean length4.6129248
Min length3

Characters and Unicode

Total characters110355
Distinct characters35
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6 ?
Unique (%)< 0.1%

Sample

1st rowPAX3
2nd rowRELB
3rd rowSOX3
4th rowFOXK2
5th rowNR1H3
ValueCountFrequency (%)
NFATC11583
 
1.9%
SOX21387
 
1.7%
KLF51374
 
1.7%
JUN1268
 
1.6%
STAT31058
 
1.3%
NR3C11020
 
1.3%
MEF2C826
 
1.0%
RELA823
 
1.0%
MEF2A821
 
1.0%
GATA3680
 
0.8%
Other values (164)13083
 
16.1%
(Missing)57565
70.6%
Histogram of lengths of the category
ValueCountFrequency (%)
nfatc11583
 
6.6%
sox21387
 
5.8%
klf51374
 
5.7%
jun1268
 
5.3%
stat31058
 
4.4%
nr3c11020
 
4.3%
mef2c826
 
3.5%
rela823
 
3.4%
mef2a821
 
3.4%
gata3680
 
2.8%
Other values (164)13083
54.7%

Most occurring characters

ValueCountFrequency (%)
A11076
 
10.0%
F8649
 
7.8%
16763
 
6.1%
R6627
 
6.0%
T6577
 
6.0%
N6479
 
5.9%
S5466
 
5.0%
O5437
 
4.9%
E5281
 
4.8%
L4894
 
4.4%
Other values (25)43106
39.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter89880
81.4%
Decimal Number20167
 
18.3%
Dash Punctuation308
 
0.3%

Most frequent character per category

ValueCountFrequency (%)
A11076
12.3%
F8649
 
9.6%
R6627
 
7.4%
T6577
 
7.3%
N6479
 
7.2%
S5466
 
6.1%
O5437
 
6.0%
E5281
 
5.9%
L4894
 
5.4%
C4810
 
5.4%
Other values (15)24584
27.4%
ValueCountFrequency (%)
16763
33.5%
24701
23.3%
34116
20.4%
41831
 
9.1%
51812
 
9.0%
6367
 
1.8%
0346
 
1.7%
7143
 
0.7%
888
 
0.4%
ValueCountFrequency (%)
-308
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin89880
81.4%
Common20475
 
18.6%

Most frequent character per script

ValueCountFrequency (%)
A11076
12.3%
F8649
 
9.6%
R6627
 
7.4%
T6577
 
7.3%
N6479
 
7.2%
S5466
 
6.1%
O5437
 
6.0%
E5281
 
5.9%
L4894
 
5.4%
C4810
 
5.4%
Other values (15)24584
27.4%
ValueCountFrequency (%)
16763
33.0%
24701
23.0%
34116
20.1%
41831
 
8.9%
51812
 
8.8%
6367
 
1.8%
0346
 
1.7%
-308
 
1.5%
7143
 
0.7%
888
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII110355
100.0%

Most frequent character per block

ValueCountFrequency (%)
A11076
 
10.0%
F8649
 
7.8%
16763
 
6.1%
R6627
 
6.0%
T6577
 
6.0%
N6479
 
5.9%
S5466
 
5.0%
O5437
 
4.9%
E5281
 
4.8%
L4894
 
4.4%
Other values (25)43106
39.1%

Gene 42
Categorical

HIGH CARDINALITY
MISSING

Distinct171
Distinct (%)0.8%
Missing60190
Missing (%)73.9%
Memory size3.1 MiB
KLF5
 
1094
SOX2
 
1080
STAT3
 
1033
RELA
 
921
NFATC1
 
904
Other values (166)
16266 

Length

Max length7
Median length5
Mean length4.526387454
Min length2

Characters and Unicode

Total characters96403
Distinct characters35
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5 ?
Unique (%)< 0.1%

Sample

1st rowHES6
2nd rowKLF15
3rd rowETV1
4th rowKLF15
5th rowPOU2F1
ValueCountFrequency (%)
KLF51094
 
1.3%
SOX21080
 
1.3%
STAT31033
 
1.3%
RELA921
 
1.1%
NFATC1904
 
1.1%
MEF2C828
 
1.0%
JUN800
 
1.0%
REL742
 
0.9%
SP1695
 
0.9%
PPARA659
 
0.8%
Other values (161)12542
 
15.4%
(Missing)60190
73.9%
Histogram of lengths of the category
ValueCountFrequency (%)
klf51094
 
5.1%
sox21080
 
5.1%
stat31033
 
4.9%
rela921
 
4.3%
nfatc1904
 
4.2%
mef2c828
 
3.9%
jun800
 
3.8%
rel742
 
3.5%
sp1695
 
3.3%
ppara659
 
3.1%
Other values (161)12542
58.9%

Most occurring characters

ValueCountFrequency (%)
A8790
 
9.1%
R6914
 
7.2%
F6800
 
7.1%
15956
 
6.2%
E5920
 
6.1%
S5658
 
5.9%
T5144
 
5.3%
L4811
 
5.0%
N4655
 
4.8%
O4574
 
4.7%
Other values (25)37181
38.6%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter79896
82.9%
Decimal Number16436
 
17.0%
Dash Punctuation71
 
0.1%

Most frequent character per category

ValueCountFrequency (%)
A8790
 
11.0%
R6914
 
8.7%
F6800
 
8.5%
E5920
 
7.4%
S5658
 
7.1%
T5144
 
6.4%
L4811
 
6.0%
N4655
 
5.8%
O4574
 
5.7%
C4123
 
5.2%
Other values (15)22507
28.2%
ValueCountFrequency (%)
15956
36.2%
24116
25.0%
33038
18.5%
51536
 
9.3%
41007
 
6.1%
6272
 
1.7%
0225
 
1.4%
7210
 
1.3%
876
 
0.5%
ValueCountFrequency (%)
-71
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin79896
82.9%
Common16507
 
17.1%

Most frequent character per script

ValueCountFrequency (%)
A8790
 
11.0%
R6914
 
8.7%
F6800
 
8.5%
E5920
 
7.4%
S5658
 
7.1%
T5144
 
6.4%
L4811
 
6.0%
N4655
 
5.8%
O4574
 
5.7%
C4123
 
5.2%
Other values (15)22507
28.2%
ValueCountFrequency (%)
15956
36.1%
24116
24.9%
33038
18.4%
51536
 
9.3%
41007
 
6.1%
6272
 
1.6%
0225
 
1.4%
7210
 
1.3%
876
 
0.5%
-71
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII96403
100.0%

Most frequent character per block

ValueCountFrequency (%)
A8790
 
9.1%
R6914
 
7.2%
F6800
 
7.1%
15956
 
6.2%
E5920
 
6.1%
S5658
 
5.9%
T5144
 
5.3%
L4811
 
5.0%
N4655
 
4.8%
O4574
 
4.7%
Other values (25)37181
38.6%

Gene 43
Categorical

HIGH CARDINALITY
MISSING

Distinct175
Distinct (%)0.9%
Missing62818
Missing (%)77.1%
Memory size3.0 MiB
KLF5
 
993
SOX2
 
981
REL
 
828
STAT3
 
667
FOXO3
 
647
Other values (170)
14554 

Length

Max length7
Median length5
Mean length4.53304767
Min length3

Characters and Unicode

Total characters84632
Distinct characters36
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7 ?
Unique (%)< 0.1%

Sample

1st rowLIN28B
2nd rowSREBF1
3rd rowHEYL
4th rowPROX1
5th rowZBTB7A
ValueCountFrequency (%)
KLF5993
 
1.2%
SOX2981
 
1.2%
REL828
 
1.0%
STAT3667
 
0.8%
FOXO3647
 
0.8%
NR3C1626
 
0.8%
NFATC1572
 
0.7%
NFATC2543
 
0.7%
RORA542
 
0.7%
NPAS2535
 
0.7%
Other values (165)11736
 
14.4%
(Missing)62818
77.1%
Histogram of lengths of the category
ValueCountFrequency (%)
klf5993
 
5.3%
sox2981
 
5.3%
rel828
 
4.4%
stat3667
 
3.6%
foxo3647
 
3.5%
nr3c1626
 
3.4%
nfatc1572
 
3.1%
nfatc2543
 
2.9%
rora542
 
2.9%
npas2535
 
2.9%
Other values (165)11736
62.9%

Most occurring characters

ValueCountFrequency (%)
A7489
 
8.8%
R6315
 
7.5%
F6087
 
7.2%
O4957
 
5.9%
S4822
 
5.7%
14511
 
5.3%
N4369
 
5.2%
T4137
 
4.9%
L4091
 
4.8%
E4037
 
4.8%
Other values (26)33817
40.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter69636
82.3%
Decimal Number14929
 
17.6%
Dash Punctuation67
 
0.1%

Most frequent character per category

ValueCountFrequency (%)
A7489
 
10.8%
R6315
 
9.1%
F6087
 
8.7%
O4957
 
7.1%
S4822
 
6.9%
N4369
 
6.3%
T4137
 
5.9%
L4091
 
5.9%
E4037
 
5.8%
C3589
 
5.2%
Other values (15)19743
28.4%
ValueCountFrequency (%)
14511
30.2%
23802
25.5%
33075
20.6%
51599
 
10.7%
41067
 
7.1%
6340
 
2.3%
7219
 
1.5%
0219
 
1.5%
895
 
0.6%
92
 
< 0.1%
ValueCountFrequency (%)
-67
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin69636
82.3%
Common14996
 
17.7%

Most frequent character per script

ValueCountFrequency (%)
A7489
 
10.8%
R6315
 
9.1%
F6087
 
8.7%
O4957
 
7.1%
S4822
 
6.9%
N4369
 
6.3%
T4137
 
5.9%
L4091
 
5.9%
E4037
 
5.8%
C3589
 
5.2%
Other values (15)19743
28.4%
ValueCountFrequency (%)
14511
30.1%
23802
25.4%
33075
20.5%
51599
 
10.7%
41067
 
7.1%
6340
 
2.3%
7219
 
1.5%
0219
 
1.5%
895
 
0.6%
-67
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII84632
100.0%

Most frequent character per block

ValueCountFrequency (%)
A7489
 
8.8%
R6315
 
7.5%
F6087
 
7.2%
O4957
 
5.9%
S4822
 
5.7%
14511
 
5.3%
N4369
 
5.2%
T4137
 
4.9%
L4091
 
4.8%
E4037
 
4.8%
Other values (26)33817
40.0%

Gene 44
Categorical

HIGH CARDINALITY
MISSING

Distinct178
Distinct (%)1.1%
Missing65633
Missing (%)80.5%
Memory size2.9 MiB
SOX2
 
1021
FOXO1
 
770
NFATC1
 
750
STAT3
 
694
NFATC2
 
678
Other values (173)
11942 

Length

Max length7
Median length5
Mean length4.668369599
Min length3

Characters and Unicode

Total characters74017
Distinct characters36
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9 ?
Unique (%)0.1%

Sample

1st rowGTF3A
2nd rowGTF3A
3rd rowFOXA3
4th rowNR4A1
5th rowHES7
ValueCountFrequency (%)
SOX21021
 
1.3%
FOXO1770
 
0.9%
NFATC1750
 
0.9%
STAT3694
 
0.9%
NFATC2678
 
0.8%
GATA3534
 
0.7%
RORA527
 
0.6%
KLF5443
 
0.5%
PPARA423
 
0.5%
ARNTL370
 
0.5%
Other values (168)9645
 
11.8%
(Missing)65633
80.5%
Histogram of lengths of the category
ValueCountFrequency (%)
sox21021
 
6.4%
foxo1770
 
4.9%
nfatc1750
 
4.7%
stat3694
 
4.4%
nfatc2678
 
4.3%
gata3534
 
3.4%
rora527
 
3.3%
klf5443
 
2.8%
ppara423
 
2.7%
arntl370
 
2.3%
Other values (168)9645
60.8%

Most occurring characters

ValueCountFrequency (%)
A7881
 
10.6%
F5433
 
7.3%
O4911
 
6.6%
T4879
 
6.6%
R4576
 
6.2%
S4347
 
5.9%
14271
 
5.8%
N3766
 
5.1%
23665
 
5.0%
X3428
 
4.6%
Other values (26)26860
36.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter60903
82.3%
Decimal Number13074
 
17.7%
Dash Punctuation40
 
0.1%

Most frequent character per category

ValueCountFrequency (%)
A7881
12.9%
F5433
 
8.9%
O4911
 
8.1%
T4879
 
8.0%
R4576
 
7.5%
S4347
 
7.1%
N3766
 
6.2%
X3428
 
5.6%
E3183
 
5.2%
C3151
 
5.2%
Other values (15)15348
25.2%
ValueCountFrequency (%)
14271
32.7%
23665
28.0%
32446
18.7%
41105
 
8.5%
5885
 
6.8%
6244
 
1.9%
7186
 
1.4%
0173
 
1.3%
897
 
0.7%
92
 
< 0.1%
ValueCountFrequency (%)
-40
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin60903
82.3%
Common13114
 
17.7%

Most frequent character per script

ValueCountFrequency (%)
A7881
12.9%
F5433
 
8.9%
O4911
 
8.1%
T4879
 
8.0%
R4576
 
7.5%
S4347
 
7.1%
N3766
 
6.2%
X3428
 
5.6%
E3183
 
5.2%
C3151
 
5.2%
Other values (15)15348
25.2%
ValueCountFrequency (%)
14271
32.6%
23665
27.9%
32446
18.7%
41105
 
8.4%
5885
 
6.7%
6244
 
1.9%
7186
 
1.4%
0173
 
1.3%
897
 
0.7%
-40
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII74017
100.0%

Most frequent character per block

ValueCountFrequency (%)
A7881
 
10.6%
F5433
 
7.3%
O4911
 
6.6%
T4879
 
6.6%
R4576
 
6.2%
S4347
 
5.9%
14271
 
5.8%
N3766
 
5.1%
23665
 
5.0%
X3428
 
4.6%
Other values (26)26860
36.3%

Gene 45
Categorical

HIGH CARDINALITY
MISSING

Distinct171
Distinct (%)1.3%
Missing68181
Missing (%)83.7%
Memory size2.9 MiB
NFATC1
 
811
SOX2
 
715
STAT3
 
697
ARNTL
 
396
KLF2
 
381
Other values (166)
10307 

Length

Max length7
Median length5
Mean length4.630269783
Min length3

Characters and Unicode

Total characters61615
Distinct characters35
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9 ?
Unique (%)0.1%

Sample

1st rowPPARD
2nd rowFOXE3
3rd rowFOXO4
4th rowZNF281
5th rowZIC3
ValueCountFrequency (%)
NFATC1811
 
1.0%
SOX2715
 
0.9%
STAT3697
 
0.9%
ARNTL396
 
0.5%
KLF2381
 
0.5%
ESR1362
 
0.4%
GATA3352
 
0.4%
RORA348
 
0.4%
MEF2C346
 
0.4%
HES5337
 
0.4%
Other values (161)8562
 
10.5%
(Missing)68181
83.7%
Histogram of lengths of the category
ValueCountFrequency (%)
nfatc1811
 
6.1%
sox2715
 
5.4%
stat3697
 
5.2%
arntl396
 
3.0%
klf2381
 
2.9%
esr1362
 
2.7%
gata3352
 
2.6%
rora348
 
2.6%
mef2c346
 
2.6%
hes5337
 
2.5%
Other values (161)8562
64.3%

Most occurring characters

ValueCountFrequency (%)
A5672
 
9.2%
F4339
 
7.0%
T4207
 
6.8%
14094
 
6.6%
S3882
 
6.3%
R3574
 
5.8%
N3412
 
5.5%
23148
 
5.1%
O3137
 
5.1%
E3105
 
5.0%
Other values (25)23045
37.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter50395
81.8%
Decimal Number11155
 
18.1%
Dash Punctuation65
 
0.1%

Most frequent character per category

ValueCountFrequency (%)
A5672
11.3%
F4339
 
8.6%
T4207
 
8.3%
S3882
 
7.7%
R3574
 
7.1%
N3412
 
6.8%
O3137
 
6.2%
E3105
 
6.2%
C2976
 
5.9%
L2351
 
4.7%
Other values (15)13740
27.3%
ValueCountFrequency (%)
14094
36.7%
23148
28.2%
31973
17.7%
4649
 
5.8%
5618
 
5.5%
6247
 
2.2%
7174
 
1.6%
0144
 
1.3%
8108
 
1.0%
ValueCountFrequency (%)
-65
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin50395
81.8%
Common11220
 
18.2%

Most frequent character per script

ValueCountFrequency (%)
A5672
11.3%
F4339
 
8.6%
T4207
 
8.3%
S3882
 
7.7%
R3574
 
7.1%
N3412
 
6.8%
O3137
 
6.2%
E3105
 
6.2%
C2976
 
5.9%
L2351
 
4.7%
Other values (15)13740
27.3%
ValueCountFrequency (%)
14094
36.5%
23148
28.1%
31973
17.6%
4649
 
5.8%
5618
 
5.5%
6247
 
2.2%
7174
 
1.6%
0144
 
1.3%
8108
 
1.0%
-65
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII61615
100.0%

Most frequent character per block

ValueCountFrequency (%)
A5672
 
9.2%
F4339
 
7.0%
T4207
 
6.8%
14094
 
6.6%
S3882
 
6.3%
R3574
 
5.8%
N3412
 
5.5%
23148
 
5.1%
O3137
 
5.1%
E3105
 
5.0%
Other values (25)23045
37.4%

Gene 46
Categorical

HIGH CARDINALITY
MISSING

Distinct161
Distinct (%)1.5%
Missing70511
Missing (%)86.5%
Memory size2.8 MiB
STAT3
 
547
NFATC1
 
484
ASCL1
 
461
RORA
 
366
GATA4
 
362
Other values (156)
8757 

Length

Max length7
Median length5
Mean length4.687983966
Min length3

Characters and Unicode

Total characters51460
Distinct characters35
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9 ?
Unique (%)0.1%

Sample

1st rowMXI1
2nd rowHEY1
3rd rowPPARD
4th rowRELB
5th rowONECUT3
ValueCountFrequency (%)
STAT3547
 
0.7%
NFATC1484
 
0.6%
ASCL1461
 
0.6%
RORA366
 
0.4%
GATA4362
 
0.4%
GATA3322
 
0.4%
MEF2D295
 
0.4%
FOXO3284
 
0.3%
MEF2C270
 
0.3%
SOX2247
 
0.3%
Other values (151)7339
 
9.0%
(Missing)70511
86.5%
Histogram of lengths of the category
ValueCountFrequency (%)
stat3547
 
5.0%
nfatc1484
 
4.4%
ascl1461
 
4.2%
rora366
 
3.3%
gata4362
 
3.3%
gata3322
 
2.9%
mef2d295
 
2.7%
foxo3284
 
2.6%
mef2c270
 
2.5%
sox2247
 
2.3%
Other values (151)7339
66.9%

Most occurring characters

ValueCountFrequency (%)
A5234
 
10.2%
F3556
 
6.9%
T3476
 
6.8%
13271
 
6.4%
R3231
 
6.3%
S2960
 
5.8%
E2552
 
5.0%
C2515
 
4.9%
O2511
 
4.9%
N2436
 
4.7%
Other values (25)19718
38.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter42072
81.8%
Decimal Number9347
 
18.2%
Dash Punctuation41
 
0.1%

Most frequent character per category

ValueCountFrequency (%)
A5234
12.4%
F3556
 
8.5%
T3476
 
8.3%
R3231
 
7.7%
S2960
 
7.0%
E2552
 
6.1%
C2515
 
6.0%
O2511
 
6.0%
N2436
 
5.8%
L1941
 
4.6%
Other values (15)11660
27.7%
ValueCountFrequency (%)
13271
35.0%
22107
22.5%
31894
20.3%
4891
 
9.5%
5602
 
6.4%
7187
 
2.0%
6173
 
1.9%
0147
 
1.6%
875
 
0.8%
ValueCountFrequency (%)
-41
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin42072
81.8%
Common9388
 
18.2%

Most frequent character per script

ValueCountFrequency (%)
A5234
12.4%
F3556
 
8.5%
T3476
 
8.3%
R3231
 
7.7%
S2960
 
7.0%
E2552
 
6.1%
C2515
 
6.0%
O2511
 
6.0%
N2436
 
5.8%
L1941
 
4.6%
Other values (15)11660
27.7%
ValueCountFrequency (%)
13271
34.8%
22107
22.4%
31894
20.2%
4891
 
9.5%
5602
 
6.4%
7187
 
2.0%
6173
 
1.8%
0147
 
1.6%
875
 
0.8%
-41
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII51460
100.0%

Most frequent character per block

ValueCountFrequency (%)
A5234
 
10.2%
F3556
 
6.9%
T3476
 
6.8%
13271
 
6.4%
R3231
 
6.3%
S2960
 
5.8%
E2552
 
5.0%
C2515
 
4.9%
O2511
 
4.9%
N2436
 
4.7%
Other values (25)19718
38.3%

Gene 47
Categorical

HIGH CARDINALITY
MISSING

Distinct151
Distinct (%)1.7%
Missing72774
Missing (%)89.3%
Memory size2.7 MiB
RBPJ
 
419
PPARA
 
411
ASCL1
 
401
MEF2C
 
384
FOXO1
 
298
Other values (146)
6801 

Length

Max length7
Median length5
Mean length4.697268763
Min length3

Characters and Unicode

Total characters40932
Distinct characters35
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5 ?
Unique (%)0.1%

Sample

1st rowNR4A1
2nd rowNR4A1
3rd rowHEY2
4th rowSREBF2
5th rowPHOX2A
ValueCountFrequency (%)
RBPJ419
 
0.5%
PPARA411
 
0.5%
ASCL1401
 
0.5%
MEF2C384
 
0.5%
FOXO1298
 
0.4%
ARNTL246
 
0.3%
NFATC1232
 
0.3%
RORA232
 
0.3%
GATA3229
 
0.3%
ESRRA228
 
0.3%
Other values (141)5634
 
6.9%
(Missing)72774
89.3%
Histogram of lengths of the category
ValueCountFrequency (%)
rbpj419
 
4.8%
ppara411
 
4.7%
ascl1401
 
4.6%
mef2c384
 
4.4%
foxo1298
 
3.4%
arntl246
 
2.8%
nfatc1232
 
2.7%
rora232
 
2.7%
gata3229
 
2.6%
esrra228
 
2.6%
Other values (141)5634
64.7%

Most occurring characters

ValueCountFrequency (%)
A4541
 
11.1%
R3193
 
7.8%
F2785
 
6.8%
12577
 
6.3%
P2265
 
5.5%
E2163
 
5.3%
S2160
 
5.3%
T2046
 
5.0%
C1913
 
4.7%
O1783
 
4.4%
Other values (25)15506
37.9%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter34033
83.1%
Decimal Number6873
 
16.8%
Dash Punctuation26
 
0.1%

Most frequent character per category

ValueCountFrequency (%)
A4541
13.3%
R3193
 
9.4%
F2785
 
8.2%
P2265
 
6.7%
E2163
 
6.4%
S2160
 
6.3%
T2046
 
6.0%
C1913
 
5.6%
O1783
 
5.2%
N1776
 
5.2%
Other values (15)9408
27.6%
ValueCountFrequency (%)
12577
37.5%
21658
24.1%
31027
 
14.9%
4637
 
9.3%
5365
 
5.3%
7226
 
3.3%
6224
 
3.3%
0109
 
1.6%
850
 
0.7%
ValueCountFrequency (%)
-26
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin34033
83.1%
Common6899
 
16.9%

Most frequent character per script

ValueCountFrequency (%)
A4541
13.3%
R3193
 
9.4%
F2785
 
8.2%
P2265
 
6.7%
E2163
 
6.4%
S2160
 
6.3%
T2046
 
6.0%
C1913
 
5.6%
O1783
 
5.2%
N1776
 
5.2%
Other values (15)9408
27.6%
ValueCountFrequency (%)
12577
37.4%
21658
24.0%
31027
 
14.9%
4637
 
9.2%
5365
 
5.3%
7226
 
3.3%
6224
 
3.2%
0109
 
1.6%
850
 
0.7%
-26
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII40932
100.0%

Most frequent character per block

ValueCountFrequency (%)
A4541
 
11.1%
R3193
 
7.8%
F2785
 
6.8%
12577
 
6.3%
P2265
 
5.5%
E2163
 
5.3%
S2160
 
5.3%
T2046
 
5.0%
C1913
 
4.7%
O1783
 
4.4%
Other values (25)15506
37.9%

Gene 48
Categorical

HIGH CARDINALITY
MISSING

Distinct144
Distinct (%)2.1%
Missing74658
Missing (%)91.6%
Memory size2.7 MiB
NPAS2
 
387
ASCL1
 
369
RBPJ
 
360
PPARA
 
321
HES5
 
248
Other values (139)
5145 

Length

Max length7
Median length5
Mean length4.665300146
Min length3

Characters and Unicode

Total characters31864
Distinct characters35
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)< 0.1%

Sample

1st rowGRHL1
2nd rowKLF15
3rd rowPOU4F3
4th rowEGR3
5th rowMESP1
ValueCountFrequency (%)
NPAS2387
 
0.5%
ASCL1369
 
0.5%
RBPJ360
 
0.4%
PPARA321
 
0.4%
HES5248
 
0.3%
NFATC1231
 
0.3%
MYC181
 
0.2%
MEF2C161
 
0.2%
GATA3158
 
0.2%
HNF1A147
 
0.2%
Other values (134)4267
 
5.2%
(Missing)74658
91.6%
Histogram of lengths of the category
ValueCountFrequency (%)
npas2387
 
5.7%
ascl1369
 
5.4%
rbpj360
 
5.3%
ppara321
 
4.7%
hes5248
 
3.6%
nfatc1231
 
3.4%
myc181
 
2.7%
mef2c161
 
2.4%
gata3158
 
2.3%
hnf1a147
 
2.2%
Other values (134)4267
62.5%

Most occurring characters

ValueCountFrequency (%)
A3407
 
10.7%
R2263
 
7.1%
12126
 
6.7%
P2042
 
6.4%
S1913
 
6.0%
F1782
 
5.6%
N1700
 
5.3%
E1633
 
5.1%
C1534
 
4.8%
21496
 
4.7%
Other values (25)11968
37.6%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter26147
82.1%
Decimal Number5663
 
17.8%
Dash Punctuation54
 
0.2%

Most frequent character per category

ValueCountFrequency (%)
A3407
13.0%
R2263
 
8.7%
P2042
 
7.8%
S1913
 
7.3%
F1782
 
6.8%
N1700
 
6.5%
E1633
 
6.2%
C1534
 
5.9%
T1320
 
5.0%
L1288
 
4.9%
Other values (15)7265
27.8%
ValueCountFrequency (%)
12126
37.5%
21496
26.4%
3762
 
13.5%
4490
 
8.7%
5334
 
5.9%
7189
 
3.3%
6163
 
2.9%
071
 
1.3%
832
 
0.6%
ValueCountFrequency (%)
-54
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin26147
82.1%
Common5717
 
17.9%

Most frequent character per script

ValueCountFrequency (%)
A3407
13.0%
R2263
 
8.7%
P2042
 
7.8%
S1913
 
7.3%
F1782
 
6.8%
N1700
 
6.5%
E1633
 
6.2%
C1534
 
5.9%
T1320
 
5.0%
L1288
 
4.9%
Other values (15)7265
27.8%
ValueCountFrequency (%)
12126
37.2%
21496
26.2%
3762
 
13.3%
4490
 
8.6%
5334
 
5.8%
7189
 
3.3%
6163
 
2.9%
071
 
1.2%
-54
 
0.9%
832
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII31864
100.0%

Most frequent character per block

ValueCountFrequency (%)
A3407
 
10.7%
R2263
 
7.1%
12126
 
6.7%
P2042
 
6.4%
S1913
 
6.0%
F1782
 
5.6%
N1700
 
5.3%
E1633
 
5.1%
C1534
 
4.8%
21496
 
4.7%
Other values (25)11968
37.6%

Gene 49
Categorical

HIGH CARDINALITY
MISSING

Distinct143
Distinct (%)2.8%
Missing76305
Missing (%)93.6%
Memory size2.6 MiB
RBPJ
 
339
NPAS2
 
252
GATA3
 
241
RORA
 
204
HES5
 
161
Other values (138)
3986 

Length

Max length7
Median length5
Mean length4.683195061
Min length3

Characters and Unicode

Total characters24273
Distinct characters35
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique17 ?
Unique (%)0.3%

Sample

1st rowDBP
2nd rowSP110
3rd rowPHOX2A
4th rowESR2
5th rowFOXO4
ValueCountFrequency (%)
RBPJ339
 
0.4%
NPAS2252
 
0.3%
GATA3241
 
0.3%
RORA204
 
0.3%
HES5161
 
0.2%
NFATC1149
 
0.2%
MEF2C143
 
0.2%
GATA4132
 
0.2%
ASCL1127
 
0.2%
HNF1A96
 
0.1%
Other values (133)3339
 
4.1%
(Missing)76305
93.6%
Histogram of lengths of the category
ValueCountFrequency (%)
rbpj339
 
6.5%
npas2252
 
4.9%
gata3241
 
4.6%
rora204
 
3.9%
hes5161
 
3.1%
nfatc1149
 
2.9%
mef2c143
 
2.8%
gata4132
 
2.5%
ascl1127
 
2.5%
hnf1a96
 
1.9%
Other values (133)3339
64.4%

Most occurring characters

ValueCountFrequency (%)
A2555
 
10.5%
R1760
 
7.3%
11587
 
6.5%
F1373
 
5.7%
P1365
 
5.6%
E1317
 
5.4%
S1249
 
5.1%
T1209
 
5.0%
N1114
 
4.6%
21076
 
4.4%
Other values (25)9668
39.8%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter19747
81.4%
Decimal Number4499
 
18.5%
Dash Punctuation27
 
0.1%

Most frequent character per category

ValueCountFrequency (%)
A2555
12.9%
R1760
 
8.9%
F1373
 
7.0%
P1365
 
6.9%
E1317
 
6.7%
S1249
 
6.3%
T1209
 
6.1%
N1114
 
5.6%
C942
 
4.8%
H905
 
4.6%
Other values (15)5958
30.2%
ValueCountFrequency (%)
11587
35.3%
21076
23.9%
3679
15.1%
4491
 
10.9%
5241
 
5.4%
7182
 
4.0%
6141
 
3.1%
070
 
1.6%
832
 
0.7%
ValueCountFrequency (%)
-27
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin19747
81.4%
Common4526
 
18.6%

Most frequent character per script

ValueCountFrequency (%)
A2555
12.9%
R1760
 
8.9%
F1373
 
7.0%
P1365
 
6.9%
E1317
 
6.7%
S1249
 
6.3%
T1209
 
6.1%
N1114
 
5.6%
C942
 
4.8%
H905
 
4.6%
Other values (15)5958
30.2%
ValueCountFrequency (%)
11587
35.1%
21076
23.8%
3679
15.0%
4491
 
10.8%
5241
 
5.3%
7182
 
4.0%
6141
 
3.1%
070
 
1.5%
832
 
0.7%
-27
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII24273
100.0%

Most frequent character per block

ValueCountFrequency (%)
A2555
 
10.5%
R1760
 
7.3%
11587
 
6.5%
F1373
 
5.7%
P1365
 
5.6%
E1317
 
5.4%
S1249
 
5.1%
T1209
 
5.0%
N1114
 
4.6%
21076
 
4.4%
Other values (25)9668
39.8%

Gene 50
Categorical

HIGH CARDINALITY
MISSING

Distinct138
Distinct (%)3.8%
Missing77850
Missing (%)95.5%
Memory size2.6 MiB
HES5
 
200
ARNTL
 
186
MEF2C
 
176
GATA3
 
139
ESR1
 
133
Other values (133)
2804 

Length

Max length7
Median length5
Mean length4.667949423
Min length3

Characters and Unicode

Total characters16982
Distinct characters36
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique18 ?
Unique (%)0.5%

Sample

1st rowFOXA3
2nd rowGTF3A
3rd rowGTF3A
4th rowPAX7
5th rowNR4A1
ValueCountFrequency (%)
HES5200
 
0.2%
ARNTL186
 
0.2%
MEF2C176
 
0.2%
GATA3139
 
0.2%
ESR1133
 
0.2%
RBPJ115
 
0.1%
RORA110
 
0.1%
ASCL192
 
0.1%
KLF474
 
0.1%
MSX269
 
0.1%
Other values (128)2344
 
2.9%
(Missing)77850
95.5%
Histogram of lengths of the category
ValueCountFrequency (%)
hes5200
 
5.5%
arntl186
 
5.1%
mef2c176
 
4.8%
gata3139
 
3.8%
esr1133
 
3.7%
rbpj115
 
3.2%
rora110
 
3.0%
ascl192
 
2.5%
klf474
 
2.0%
msx269
 
1.9%
Other values (128)2344
64.4%

Most occurring characters

ValueCountFrequency (%)
A1424
 
8.4%
R1307
 
7.7%
E1191
 
7.0%
11124
 
6.6%
F974
 
5.7%
S964
 
5.7%
L757
 
4.5%
T744
 
4.4%
P737
 
4.3%
N723
 
4.3%
Other values (26)7037
41.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter13792
81.2%
Decimal Number3162
 
18.6%
Dash Punctuation28
 
0.2%

Most frequent character per category

ValueCountFrequency (%)
A1424
 
10.3%
R1307
 
9.5%
E1191
 
8.6%
F974
 
7.1%
S964
 
7.0%
L757
 
5.5%
T744
 
5.4%
P737
 
5.3%
N723
 
5.2%
H699
 
5.1%
Other values (15)4272
31.0%
ValueCountFrequency (%)
11124
35.5%
2685
21.7%
3469
14.8%
4324
 
10.2%
5235
 
7.4%
7129
 
4.1%
6124
 
3.9%
055
 
1.7%
815
 
0.5%
92
 
0.1%
ValueCountFrequency (%)
-28
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin13792
81.2%
Common3190
 
18.8%

Most frequent character per script

ValueCountFrequency (%)
A1424
 
10.3%
R1307
 
9.5%
E1191
 
8.6%
F974
 
7.1%
S964
 
7.0%
L757
 
5.5%
T744
 
5.4%
P737
 
5.3%
N723
 
5.2%
H699
 
5.1%
Other values (15)4272
31.0%
ValueCountFrequency (%)
11124
35.2%
2685
21.5%
3469
14.7%
4324
 
10.2%
5235
 
7.4%
7129
 
4.0%
6124
 
3.9%
055
 
1.7%
-28
 
0.9%
815
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII16982
100.0%

Most frequent character per block

ValueCountFrequency (%)
A1424
 
8.4%
R1307
 
7.7%
E1191
 
7.0%
11124
 
6.6%
F974
 
5.7%
S964
 
5.7%
L757
 
4.5%
T744
 
4.4%
P737
 
4.3%
N723
 
4.3%
Other values (26)7037
41.4%

Gene 51
Categorical

HIGH CARDINALITY
MISSING

Distinct118
Distinct (%)4.8%
Missing79016
Missing (%)97.0%
Memory size2.6 MiB
GATA3
190 
NFATC1
 
151
ASCL1
 
133
ARNTL
 
98
PGR
 
88
Other values (113)
1812 

Length

Max length7
Median length5
Mean length4.795307443
Min length3

Characters and Unicode

Total characters11854
Distinct characters35
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique15 ?
Unique (%)0.6%

Sample

1st rowSREBF2
2nd rowHES5
3rd rowSREBF2
4th rowBHLHE40
5th rowGTF2B
ValueCountFrequency (%)
GATA3190
 
0.2%
NFATC1151
 
0.2%
ASCL1133
 
0.2%
ARNTL98
 
0.1%
PGR88
 
0.1%
RBPJ81
 
0.1%
ESR164
 
0.1%
HES559
 
0.1%
MSX257
 
0.1%
ATOH146
 
0.1%
Other values (108)1505
 
1.8%
(Missing)79016
97.0%
Histogram of lengths of the category
ValueCountFrequency (%)
gata3190
 
7.7%
nfatc1151
 
6.1%
ascl1133
 
5.4%
arntl98
 
4.0%
pgr88
 
3.6%
rbpj81
 
3.3%
esr164
 
2.6%
hes559
 
2.4%
msx257
 
2.3%
atoh146
 
1.9%
Other values (108)1505
60.9%

Most occurring characters

ValueCountFrequency (%)
A1271
 
10.7%
1902
 
7.6%
T765
 
6.5%
R741
 
6.3%
F713
 
6.0%
S637
 
5.4%
E579
 
4.9%
N579
 
4.9%
L524
 
4.4%
C512
 
4.3%
Other values (25)4631
39.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter9586
80.9%
Decimal Number2230
 
18.8%
Dash Punctuation38
 
0.3%

Most frequent character per category

ValueCountFrequency (%)
A1271
13.3%
T765
 
8.0%
R741
 
7.7%
F713
 
7.4%
S637
 
6.6%
E579
 
6.0%
N579
 
6.0%
L524
 
5.5%
C512
 
5.3%
P451
 
4.7%
Other values (15)2814
29.4%
ValueCountFrequency (%)
1902
40.4%
3461
20.7%
2374
16.8%
4186
 
8.3%
5126
 
5.7%
768
 
3.0%
663
 
2.8%
046
 
2.1%
84
 
0.2%
ValueCountFrequency (%)
-38
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin9586
80.9%
Common2268
 
19.1%

Most frequent character per script

ValueCountFrequency (%)
A1271
13.3%
T765
 
8.0%
R741
 
7.7%
F713
 
7.4%
S637
 
6.6%
E579
 
6.0%
N579
 
6.0%
L524
 
5.5%
C512
 
5.3%
P451
 
4.7%
Other values (15)2814
29.4%
ValueCountFrequency (%)
1902
39.8%
3461
20.3%
2374
16.5%
4186
 
8.2%
5126
 
5.6%
768
 
3.0%
663
 
2.8%
046
 
2.0%
-38
 
1.7%
84
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII11854
100.0%

Most frequent character per block

ValueCountFrequency (%)
A1271
 
10.7%
1902
 
7.6%
T765
 
6.5%
R741
 
6.3%
F713
 
6.0%
S637
 
5.4%
E579
 
4.9%
N579
 
4.9%
L524
 
4.4%
C512
 
4.3%
Other values (25)4631
39.1%

Gene 52
Categorical

HIGH CARDINALITY
MISSING

Distinct116
Distinct (%)6.6%
Missing79738
Missing (%)97.9%
Memory size2.5 MiB
RBPJ
 
121
ESR1
 
94
MSX2
 
87
NFATC1
 
65
MEF2C
 
64
Other values (111)
1319 

Length

Max length7
Median length5
Mean length4.650857143
Min length3

Characters and Unicode

Total characters8139
Distinct characters34
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique16 ?
Unique (%)0.9%

Sample

1st rowESR2
2nd rowHES7
3rd rowCXXC5
4th rowESR2
5th rowESR2
ValueCountFrequency (%)
RBPJ121
 
0.1%
ESR194
 
0.1%
MSX287
 
0.1%
NFATC165
 
0.1%
MEF2C64
 
0.1%
MEF2D63
 
0.1%
GATA356
 
0.1%
RORA42
 
0.1%
ATOH138
 
< 0.1%
PGR38
 
< 0.1%
Other values (106)1082
 
1.3%
(Missing)79738
97.9%
Histogram of lengths of the category
ValueCountFrequency (%)
rbpj121
 
6.9%
esr194
 
5.4%
msx287
 
5.0%
nfatc165
 
3.7%
mef2c64
 
3.7%
mef2d63
 
3.6%
gata356
 
3.2%
rora42
 
2.4%
pgr38
 
2.2%
atoh138
 
2.2%
Other values (106)1082
61.8%

Most occurring characters

ValueCountFrequency (%)
A625
 
7.7%
R612
 
7.5%
1605
 
7.4%
E541
 
6.6%
F535
 
6.6%
S397
 
4.9%
B378
 
4.6%
P376
 
4.6%
X359
 
4.4%
2342
 
4.2%
Other values (24)3369
41.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter6593
81.0%
Decimal Number1514
 
18.6%
Dash Punctuation32
 
0.4%

Most frequent character per category

ValueCountFrequency (%)
A625
 
9.5%
R612
 
9.3%
E541
 
8.2%
F535
 
8.1%
S397
 
6.0%
B378
 
5.7%
P376
 
5.7%
X359
 
5.4%
M325
 
4.9%
O315
 
4.8%
Other values (15)2130
32.3%
ValueCountFrequency (%)
1605
40.0%
2342
22.6%
3214
 
14.1%
4129
 
8.5%
583
 
5.5%
657
 
3.8%
752
 
3.4%
032
 
2.1%
ValueCountFrequency (%)
-32
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin6593
81.0%
Common1546
 
19.0%

Most frequent character per script

ValueCountFrequency (%)
A625
 
9.5%
R612
 
9.3%
E541
 
8.2%
F535
 
8.1%
S397
 
6.0%
B378
 
5.7%
P376
 
5.7%
X359
 
5.4%
M325
 
4.9%
O315
 
4.8%
Other values (15)2130
32.3%
ValueCountFrequency (%)
1605
39.1%
2342
22.1%
3214
 
13.8%
4129
 
8.3%
583
 
5.4%
657
 
3.7%
752
 
3.4%
032
 
2.1%
-32
 
2.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII8139
100.0%

Most frequent character per block

ValueCountFrequency (%)
A625
 
7.7%
R612
 
7.5%
1605
 
7.4%
E541
 
6.6%
F535
 
6.6%
S397
 
4.9%
B378
 
4.6%
P376
 
4.6%
X359
 
4.4%
2342
 
4.2%
Other values (24)3369
41.4%

Gene 53
Categorical

HIGH CARDINALITY
MISSING

Distinct109
Distinct (%)9.2%
Missing80302
Missing (%)98.5%
Memory size2.5 MiB
PGR
 
64
HES5
 
61
ATOH1
 
58
ASCL1
 
56
MEF2C
 
55
Other values (104)
892 

Length

Max length7
Median length5
Mean length4.588532884
Min length3

Characters and Unicode

Total characters5442
Distinct characters35
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique18 ?
Unique (%)1.5%

Sample

1st rowPROX1
2nd rowCXXC5
3rd rowARID2
4th rowPAX7
5th rowKLF15
ValueCountFrequency (%)
PGR64
 
0.1%
HES561
 
0.1%
ATOH158
 
0.1%
ASCL156
 
0.1%
MEF2C55
 
0.1%
PAX347
 
0.1%
MSX236
 
< 0.1%
ARNTL34
 
< 0.1%
FOXE329
 
< 0.1%
ESR128
 
< 0.1%
Other values (99)718
 
0.9%
(Missing)80302
98.5%
Histogram of lengths of the category
ValueCountFrequency (%)
pgr64
 
5.4%
hes561
 
5.1%
atoh158
 
4.9%
ascl156
 
4.7%
mef2c55
 
4.6%
pax347
 
4.0%
msx236
 
3.0%
arntl34
 
2.9%
foxe329
 
2.4%
esr128
 
2.4%
Other values (99)718
60.5%

Most occurring characters

ValueCountFrequency (%)
A490
 
9.0%
E341
 
6.3%
1335
 
6.2%
X325
 
6.0%
F321
 
5.9%
S317
 
5.8%
P313
 
5.8%
R313
 
5.8%
H250
 
4.6%
2234
 
4.3%
Other values (25)2203
40.5%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter4420
81.2%
Decimal Number1014
 
18.6%
Dash Punctuation8
 
0.1%

Most frequent character per category

ValueCountFrequency (%)
A490
 
11.1%
E341
 
7.7%
X325
 
7.4%
F321
 
7.3%
S317
 
7.2%
P313
 
7.1%
R313
 
7.1%
H250
 
5.7%
L234
 
5.3%
O205
 
4.6%
Other values (15)1311
29.7%
ValueCountFrequency (%)
1335
33.0%
2234
23.1%
3172
17.0%
591
 
9.0%
485
 
8.4%
749
 
4.8%
634
 
3.4%
012
 
1.2%
82
 
0.2%
ValueCountFrequency (%)
-8
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin4420
81.2%
Common1022
 
18.8%

Most frequent character per script

ValueCountFrequency (%)
A490
 
11.1%
E341
 
7.7%
X325
 
7.4%
F321
 
7.3%
S317
 
7.2%
P313
 
7.1%
R313
 
7.1%
H250
 
5.7%
L234
 
5.3%
O205
 
4.6%
Other values (15)1311
29.7%
ValueCountFrequency (%)
1335
32.8%
2234
22.9%
3172
16.8%
591
 
8.9%
485
 
8.3%
749
 
4.8%
634
 
3.3%
012
 
1.2%
-8
 
0.8%
82
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII5442
100.0%

Most frequent character per block

ValueCountFrequency (%)
A490
 
9.0%
E341
 
6.3%
1335
 
6.2%
X325
 
6.0%
F321
 
5.9%
S317
 
5.8%
P313
 
5.8%
R313
 
5.8%
H250
 
4.6%
2234
 
4.3%
Other values (25)2203
40.5%

Gene 54
Categorical

HIGH CARDINALITY
HIGH CORRELATION
MISSING

Distinct94
Distinct (%)12.3%
Missing80721
Missing (%)99.1%
Memory size2.5 MiB
MSX2
64 
GATA3
56 
RBPJ
 
48
ASCL1
 
42
MITF
 
35
Other values (89)
522 

Length

Max length7
Median length5
Mean length4.732724902
Min length3

Characters and Unicode

Total characters3630
Distinct characters34
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique21 ?
Unique (%)2.7%

Sample

1st rowGRHL1
2nd rowSREBF2
3rd rowEGR3
4th rowPOU4F3
5th rowCXXC5
ValueCountFrequency (%)
MSX264
 
0.1%
GATA356
 
0.1%
RBPJ48
 
0.1%
ASCL142
 
0.1%
MITF35
 
< 0.1%
NFATC130
 
< 0.1%
POU4F329
 
< 0.1%
ARID229
 
< 0.1%
HNF1A24
 
< 0.1%
ATOH124
 
< 0.1%
Other values (84)386
 
0.5%
(Missing)80721
99.1%
Histogram of lengths of the category
ValueCountFrequency (%)
msx264
 
8.3%
gata356
 
7.3%
rbpj48
 
6.3%
ascl142
 
5.5%
mitf35
 
4.6%
nfatc130
 
3.9%
pou4f329
 
3.8%
arid229
 
3.8%
hnf1a24
 
3.1%
atoh124
 
3.1%
Other values (84)386
50.3%

Most occurring characters

ValueCountFrequency (%)
A375
 
10.3%
1262
 
7.2%
F238
 
6.6%
R214
 
5.9%
X205
 
5.6%
P192
 
5.3%
S179
 
4.9%
T177
 
4.9%
O163
 
4.5%
3154
 
4.2%
Other values (24)1471
40.5%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter2906
80.1%
Decimal Number720
 
19.8%
Dash Punctuation4
 
0.1%

Most frequent character per category

ValueCountFrequency (%)
A375
12.9%
F238
 
8.2%
R214
 
7.4%
X205
 
7.1%
P192
 
6.6%
S179
 
6.2%
T177
 
6.1%
O163
 
5.6%
M125
 
4.3%
G117
 
4.0%
Other values (15)921
31.7%
ValueCountFrequency (%)
1262
36.4%
3154
21.4%
2145
20.1%
478
 
10.8%
029
 
4.0%
620
 
2.8%
516
 
2.2%
716
 
2.2%
ValueCountFrequency (%)
-4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin2906
80.1%
Common724
 
19.9%

Most frequent character per script

ValueCountFrequency (%)
A375
12.9%
F238
 
8.2%
R214
 
7.4%
X205
 
7.1%
P192
 
6.6%
S179
 
6.2%
T177
 
6.1%
O163
 
5.6%
M125
 
4.3%
G117
 
4.0%
Other values (15)921
31.7%
ValueCountFrequency (%)
1262
36.2%
3154
21.3%
2145
20.0%
478
 
10.8%
029
 
4.0%
620
 
2.8%
516
 
2.2%
716
 
2.2%
-4
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII3630
100.0%

Most frequent character per block

ValueCountFrequency (%)
A375
 
10.3%
1262
 
7.2%
F238
 
6.6%
R214
 
5.9%
X205
 
5.6%
P192
 
5.3%
S179
 
4.9%
T177
 
4.9%
O163
 
4.5%
3154
 
4.2%
Other values (24)1471
40.5%

Gene 55
Categorical

HIGH CARDINALITY
HIGH CORRELATION
MISSING

Distinct72
Distinct (%)14.8%
Missing81001
Missing (%)99.4%
Memory size2.5 MiB
ATOH1
43 
RBPJ
37 
ESR1
 
28
PAX6
 
26
MEF2C
 
23
Other values (67)
330 

Length

Max length7
Median length4
Mean length4.531827515
Min length3

Characters and Unicode

Total characters2207
Distinct characters34
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique15 ?
Unique (%)3.1%

Sample

1st rowHEY1
2nd rowTWIST2
3rd rowEGR3
4th rowHEY1
5th rowARID2
ValueCountFrequency (%)
ATOH143
 
0.1%
RBPJ37
 
< 0.1%
ESR128
 
< 0.1%
PAX626
 
< 0.1%
MEF2C23
 
< 0.1%
FOXE321
 
< 0.1%
MSX221
 
< 0.1%
POU4F312
 
< 0.1%
ARID212
 
< 0.1%
FOXO411
 
< 0.1%
Other values (62)253
 
0.3%
(Missing)81001
99.4%
Histogram of lengths of the category
ValueCountFrequency (%)
atoh143
 
8.8%
rbpj37
 
7.6%
esr128
 
5.7%
pax626
 
5.3%
mef2c23
 
4.7%
foxe321
 
4.3%
msx221
 
4.3%
pou4f312
 
2.5%
arid212
 
2.5%
esr211
 
2.3%
Other values (62)253
52.0%

Most occurring characters

ValueCountFrequency (%)
E159
 
7.2%
1147
 
6.7%
O147
 
6.7%
R146
 
6.6%
X145
 
6.6%
F139
 
6.3%
A134
 
6.1%
P133
 
6.0%
H119
 
5.4%
S102
 
4.6%
Other values (24)836
37.9%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter1776
80.5%
Decimal Number424
 
19.2%
Dash Punctuation7
 
0.3%

Most frequent character per category

ValueCountFrequency (%)
E159
 
9.0%
O147
 
8.3%
R146
 
8.2%
X145
 
8.2%
F139
 
7.8%
A134
 
7.5%
P133
 
7.5%
H119
 
6.7%
S102
 
5.7%
B88
 
5.0%
Other values (15)464
26.1%
ValueCountFrequency (%)
1147
34.7%
296
22.6%
374
17.5%
440
 
9.4%
630
 
7.1%
718
 
4.2%
512
 
2.8%
07
 
1.7%
ValueCountFrequency (%)
-7
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1776
80.5%
Common431
 
19.5%

Most frequent character per script

ValueCountFrequency (%)
E159
 
9.0%
O147
 
8.3%
R146
 
8.2%
X145
 
8.2%
F139
 
7.8%
A134
 
7.5%
P133
 
7.5%
H119
 
6.7%
S102
 
5.7%
B88
 
5.0%
Other values (15)464
26.1%
ValueCountFrequency (%)
1147
34.1%
296
22.3%
374
17.2%
440
 
9.3%
630
 
7.0%
718
 
4.2%
512
 
2.8%
07
 
1.6%
-7
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII2207
100.0%

Most frequent character per block

ValueCountFrequency (%)
E159
 
7.2%
1147
 
6.7%
O147
 
6.7%
R146
 
6.6%
X145
 
6.6%
F139
 
6.3%
A134
 
6.1%
P133
 
6.0%
H119
 
5.4%
S102
 
4.6%
Other values (24)836
37.9%

Gene 56
Categorical

HIGH CARDINALITY
HIGH CORRELATION
MISSING

Distinct51
Distinct (%)18.5%
Missing81212
Missing (%)99.7%
Memory size2.5 MiB
POU4F3
21 
PGR
21 
ARID2
21 
ATOH1
 
14
ASCL1
 
13
Other values (46)
186 

Length

Max length7
Median length5
Mean length4.688405797
Min length3

Characters and Unicode

Total characters1294
Distinct characters33
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8 ?
Unique (%)2.9%

Sample

1st rowSREBF2
2nd rowPROX1
3rd rowPROX1
4th rowHEYL
5th rowSIX6
ValueCountFrequency (%)
POU4F321
 
< 0.1%
PGR21
 
< 0.1%
ARID221
 
< 0.1%
ATOH114
 
< 0.1%
ASCL113
 
< 0.1%
HES512
 
< 0.1%
SOX109
 
< 0.1%
NEUROG29
 
< 0.1%
EOMES9
 
< 0.1%
MSX29
 
< 0.1%
Other values (41)138
 
0.2%
(Missing)81212
99.7%
Histogram of lengths of the category
ValueCountFrequency (%)
arid221
 
7.6%
pgr21
 
7.6%
pou4f321
 
7.6%
atoh114
 
5.1%
ascl113
 
4.7%
hes512
 
4.3%
msx29
 
3.3%
eomes9
 
3.3%
sox109
 
3.3%
neurog29
 
3.3%
Other values (41)138
50.0%

Most occurring characters

ValueCountFrequency (%)
O93
 
7.2%
E92
 
7.1%
R85
 
6.6%
S82
 
6.3%
X82
 
6.3%
A82
 
6.3%
176
 
5.9%
P69
 
5.3%
F61
 
4.7%
261
 
4.7%
Other values (23)511
39.5%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter1021
78.9%
Decimal Number272
 
21.0%
Dash Punctuation1
 
0.1%

Most frequent character per category

ValueCountFrequency (%)
O93
 
9.1%
E92
 
9.0%
R85
 
8.3%
S82
 
8.0%
X82
 
8.0%
A82
 
8.0%
P69
 
6.8%
F61
 
6.0%
H51
 
5.0%
G37
 
3.6%
Other values (14)287
28.1%
ValueCountFrequency (%)
176
27.9%
261
22.4%
345
16.5%
430
 
11.0%
521
 
7.7%
614
 
5.1%
714
 
5.1%
011
 
4.0%
ValueCountFrequency (%)
-1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1021
78.9%
Common273
 
21.1%

Most frequent character per script

ValueCountFrequency (%)
O93
 
9.1%
E92
 
9.0%
R85
 
8.3%
S82
 
8.0%
X82
 
8.0%
A82
 
8.0%
P69
 
6.8%
F61
 
6.0%
H51
 
5.0%
G37
 
3.6%
Other values (14)287
28.1%
ValueCountFrequency (%)
176
27.8%
261
22.3%
345
16.5%
430
 
11.0%
521
 
7.7%
614
 
5.1%
714
 
5.1%
011
 
4.0%
-1
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII1294
100.0%

Most frequent character per block

ValueCountFrequency (%)
O93
 
7.2%
E92
 
7.1%
R85
 
6.6%
S82
 
6.3%
X82
 
6.3%
A82
 
6.3%
176
 
5.9%
P69
 
5.3%
F61
 
4.7%
261
 
4.7%
Other values (23)511
39.5%

Gene 57
Categorical

HIGH CORRELATION
MISSING

Distinct26
Distinct (%)21.8%
Missing81369
Missing (%)99.9%
Memory size2.5 MiB
MSX2
21 
RBPJ
11 
SOX21
MESP1
GATA3
Other values (21)
61 

Length

Max length7
Median length5
Mean length4.756302521
Min length4

Characters and Unicode

Total characters566
Distinct characters28
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5 ?
Unique (%)4.2%

Sample

1st rowGLI1
2nd rowESR2
3rd rowPROX1
4th rowMESP1
5th rowESR2
ValueCountFrequency (%)
MSX221
 
< 0.1%
RBPJ11
 
< 0.1%
SOX219
 
< 0.1%
MESP19
 
< 0.1%
GATA38
 
< 0.1%
PROX17
 
< 0.1%
POU4F37
 
< 0.1%
ARID27
 
< 0.1%
ATOH16
 
< 0.1%
RELA4
 
< 0.1%
Other values (16)30
 
< 0.1%
(Missing)81369
99.9%
Histogram of lengths of the category
ValueCountFrequency (%)
msx221
17.6%
rbpj11
 
9.2%
mesp19
 
7.6%
sox219
 
7.6%
gata38
 
6.7%
pou4f37
 
5.9%
prox17
 
5.9%
arid27
 
5.9%
atoh16
 
5.0%
rela4
 
3.4%
Other values (16)30
25.2%

Most occurring characters

ValueCountFrequency (%)
S48
 
8.5%
247
 
8.3%
X47
 
8.3%
O45
 
8.0%
141
 
7.2%
R37
 
6.5%
A37
 
6.5%
P36
 
6.4%
M32
 
5.7%
E29
 
5.1%
Other values (18)167
29.5%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter443
78.3%
Decimal Number123
 
21.7%

Most frequent character per category

ValueCountFrequency (%)
S48
10.8%
X47
10.6%
O45
10.2%
R37
 
8.4%
A37
 
8.4%
P36
 
8.1%
M32
 
7.2%
E29
 
6.5%
F20
 
4.5%
T19
 
4.3%
Other values (12)93
21.0%
ValueCountFrequency (%)
247
38.2%
141
33.3%
320
16.3%
410
 
8.1%
03
 
2.4%
52
 
1.6%

Most occurring scripts

ValueCountFrequency (%)
Latin443
78.3%
Common123
 
21.7%

Most frequent character per script

ValueCountFrequency (%)
S48
10.8%
X47
10.6%
O45
10.2%
R37
 
8.4%
A37
 
8.4%
P36
 
8.1%
M32
 
7.2%
E29
 
6.5%
F20
 
4.5%
T19
 
4.3%
Other values (12)93
21.0%
ValueCountFrequency (%)
247
38.2%
141
33.3%
320
16.3%
410
 
8.1%
03
 
2.4%
52
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII566
100.0%

Most frequent character per block

ValueCountFrequency (%)
S48
 
8.5%
247
 
8.3%
X47
 
8.3%
O45
 
8.0%
141
 
7.2%
R37
 
6.5%
A37
 
6.5%
P36
 
6.4%
M32
 
5.7%
E29
 
5.1%
Other values (18)167
29.5%

Gene 58
Categorical

HIGH CORRELATION
MISSING

Distinct22
Distinct (%)39.3%
Missing81432
Missing (%)99.9%
Memory size2.5 MiB
ATOH1
14 
FOXE3
ESR1
POU4F3
ARID2
Other values (17)
25 

Length

Max length6
Median length5
Mean length4.642857143
Min length3

Characters and Unicode

Total characters260
Distinct characters31
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9 ?
Unique (%)16.1%

Sample

1st rowPOU4F3
2nd rowHES7
3rd rowHEY1
4th rowPBX1
5th rowMESP1
ValueCountFrequency (%)
ATOH114
 
< 0.1%
FOXE37
 
< 0.1%
ESR14
 
< 0.1%
POU4F33
 
< 0.1%
ARID23
 
< 0.1%
MESP12
 
< 0.1%
HEYL2
 
< 0.1%
PBX12
 
< 0.1%
HEY12
 
< 0.1%
HES72
 
< 0.1%
Other values (12)15
 
< 0.1%
(Missing)81432
99.9%
Histogram of lengths of the category
ValueCountFrequency (%)
atoh114
25.0%
foxe37
12.5%
esr14
 
7.1%
arid23
 
5.4%
pou4f33
 
5.4%
mesp12
 
3.6%
pbx12
 
3.6%
hes52
 
3.6%
hey12
 
3.6%
sox212
 
3.6%
Other values (12)15
26.8%

Most occurring characters

ValueCountFrequency (%)
127
 
10.4%
O26
 
10.0%
E26
 
10.0%
H23
 
8.8%
A18
 
6.9%
T17
 
6.5%
S13
 
5.0%
312
 
4.6%
X12
 
4.6%
F11
 
4.2%
Other values (21)75
28.8%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter204
78.5%
Decimal Number55
 
21.2%
Dash Punctuation1
 
0.4%

Most frequent character per category

ValueCountFrequency (%)
O26
12.7%
E26
12.7%
H23
11.3%
A18
8.8%
T17
8.3%
S13
 
6.4%
X12
 
5.9%
F11
 
5.4%
R11
 
5.4%
B8
 
3.9%
Other values (13)39
19.1%
ValueCountFrequency (%)
127
49.1%
312
21.8%
27
 
12.7%
43
 
5.5%
73
 
5.5%
52
 
3.6%
61
 
1.8%
ValueCountFrequency (%)
-1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin204
78.5%
Common56
 
21.5%

Most frequent character per script

ValueCountFrequency (%)
O26
12.7%
E26
12.7%
H23
11.3%
A18
8.8%
T17
8.3%
S13
 
6.4%
X12
 
5.9%
F11
 
5.4%
R11
 
5.4%
B8
 
3.9%
Other values (13)39
19.1%
ValueCountFrequency (%)
127
48.2%
312
21.4%
27
 
12.5%
43
 
5.4%
73
 
5.4%
52
 
3.6%
-1
 
1.8%
61
 
1.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII260
100.0%

Most frequent character per block

ValueCountFrequency (%)
127
 
10.4%
O26
 
10.0%
E26
 
10.0%
H23
 
8.8%
A18
 
6.9%
T17
 
6.5%
S13
 
5.0%
312
 
4.6%
X12
 
4.6%
F11
 
4.2%
Other values (21)75
28.8%

Gene 59
Categorical

HIGH CORRELATION
MISSING

Distinct10
Distinct (%)41.7%
Missing81464
Missing (%)> 99.9%
Memory size2.5 MiB
ARID2
POU4F3
PGR
NFATC2
CXXC5
Other values (5)

Length

Max length6
Median length5
Mean length4.958333333
Min length3

Characters and Unicode

Total characters119
Distinct characters23
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7 ?
Unique (%)29.2%

Sample

1st rowARID2
2nd rowARID2
3rd rowARID2
4th rowPAX7
5th rowSIX6
ValueCountFrequency (%)
ARID27
 
< 0.1%
POU4F37
 
< 0.1%
PGR3
 
< 0.1%
NFATC21
 
< 0.1%
CXXC51
 
< 0.1%
IRF41
 
< 0.1%
MLXIPL1
 
< 0.1%
PAX71
 
< 0.1%
SIX61
 
< 0.1%
MAFB1
 
< 0.1%
(Missing)81464
> 99.9%
Histogram of lengths of the category
ValueCountFrequency (%)
arid27
29.2%
pou4f37
29.2%
pgr3
12.5%
mlxipl1
 
4.2%
cxxc51
 
4.2%
mafb1
 
4.2%
pax71
 
4.2%
irf41
 
4.2%
six61
 
4.2%
nfatc21
 
4.2%

Most occurring characters

ValueCountFrequency (%)
P12
10.1%
R11
 
9.2%
A10
 
8.4%
I10
 
8.4%
F10
 
8.4%
28
 
6.7%
48
 
6.7%
D7
 
5.9%
O7
 
5.9%
U7
 
5.9%
Other values (13)29
24.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter93
78.2%
Decimal Number26
 
21.8%

Most frequent character per category

ValueCountFrequency (%)
P12
12.9%
R11
11.8%
A10
10.8%
I10
10.8%
F10
10.8%
D7
7.5%
O7
7.5%
U7
7.5%
X5
5.4%
C3
 
3.2%
Other values (7)11
11.8%
ValueCountFrequency (%)
28
30.8%
48
30.8%
37
26.9%
71
 
3.8%
61
 
3.8%
51
 
3.8%

Most occurring scripts

ValueCountFrequency (%)
Latin93
78.2%
Common26
 
21.8%

Most frequent character per script

ValueCountFrequency (%)
P12
12.9%
R11
11.8%
A10
10.8%
I10
10.8%
F10
10.8%
D7
7.5%
O7
7.5%
U7
7.5%
X5
5.4%
C3
 
3.2%
Other values (7)11
11.8%
ValueCountFrequency (%)
28
30.8%
48
30.8%
37
26.9%
71
 
3.8%
61
 
3.8%
51
 
3.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII119
100.0%

Most frequent character per block

ValueCountFrequency (%)
P12
10.1%
R11
 
9.2%
A10
 
8.4%
I10
 
8.4%
F10
 
8.4%
28
 
6.7%
48
 
6.7%
D7
 
5.9%
O7
 
5.9%
U7
 
5.9%
Other values (13)29
24.4%

Gene 60
Categorical

HIGH CORRELATION
MISSING

Distinct5
Distinct (%)71.4%
Missing81481
Missing (%)> 99.9%
Memory size2.5 MiB
MSX2
PROX1
KLF10
EGR3
BCL6

Length

Max length5
Median length4
Mean length4.285714286
Min length4

Characters and Unicode

Total characters30
Distinct characters18
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)57.1%

Sample

1st rowPROX1
2nd rowKLF10
3rd rowEGR3
4th rowMSX2
5th rowBCL6
ValueCountFrequency (%)
MSX23
 
< 0.1%
PROX11
 
< 0.1%
KLF101
 
< 0.1%
EGR31
 
< 0.1%
BCL61
 
< 0.1%
(Missing)81481
> 99.9%
Histogram of lengths of the category
ValueCountFrequency (%)
msx23
42.9%
egr31
 
14.3%
bcl61
 
14.3%
klf101
 
14.3%
prox11
 
14.3%

Most occurring characters

ValueCountFrequency (%)
X4
13.3%
M3
 
10.0%
S3
 
10.0%
23
 
10.0%
R2
 
6.7%
12
 
6.7%
L2
 
6.7%
P1
 
3.3%
O1
 
3.3%
K1
 
3.3%
Other values (8)8
26.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter22
73.3%
Decimal Number8
 
26.7%

Most frequent character per category

ValueCountFrequency (%)
X4
18.2%
M3
13.6%
S3
13.6%
R2
9.1%
L2
9.1%
P1
 
4.5%
O1
 
4.5%
K1
 
4.5%
F1
 
4.5%
E1
 
4.5%
Other values (3)3
13.6%
ValueCountFrequency (%)
23
37.5%
12
25.0%
01
 
12.5%
31
 
12.5%
61
 
12.5%

Most occurring scripts

ValueCountFrequency (%)
Latin22
73.3%
Common8
 
26.7%

Most frequent character per script

ValueCountFrequency (%)
X4
18.2%
M3
13.6%
S3
13.6%
R2
9.1%
L2
9.1%
P1
 
4.5%
O1
 
4.5%
K1
 
4.5%
F1
 
4.5%
E1
 
4.5%
Other values (3)3
13.6%
ValueCountFrequency (%)
23
37.5%
12
25.0%
01
 
12.5%
31
 
12.5%
61
 
12.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII30
100.0%

Most frequent character per block

ValueCountFrequency (%)
X4
13.3%
M3
 
10.0%
S3
 
10.0%
23
 
10.0%
R2
 
6.7%
12
 
6.7%
L2
 
6.7%
P1
 
3.3%
O1
 
3.3%
K1
 
3.3%
Other values (8)8
26.7%

Gene 61
Categorical

HIGH CORRELATION
MISSING

Distinct3
Distinct (%)75.0%
Missing81484
Missing (%)> 99.9%
Memory size2.5 MiB
ATOH1
MLXIPL
FOXE3

Length

Max length6
Median length5
Mean length5.25
Min length5

Characters and Unicode

Total characters21
Distinct characters13
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)50.0%

Sample

1st rowFOXE3
2nd rowMLXIPL
3rd rowATOH1
4th rowATOH1
ValueCountFrequency (%)
ATOH12
 
< 0.1%
MLXIPL1
 
< 0.1%
FOXE31
 
< 0.1%
(Missing)81484
> 99.9%
Histogram of lengths of the category
ValueCountFrequency (%)
atoh12
50.0%
mlxipl1
25.0%
foxe31
25.0%

Most occurring characters

ValueCountFrequency (%)
O3
14.3%
X2
9.5%
L2
9.5%
A2
9.5%
T2
9.5%
H2
9.5%
12
9.5%
F1
 
4.8%
E1
 
4.8%
31
 
4.8%
Other values (3)3
14.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter18
85.7%
Decimal Number3
 
14.3%

Most frequent character per category

ValueCountFrequency (%)
O3
16.7%
X2
11.1%
L2
11.1%
A2
11.1%
T2
11.1%
H2
11.1%
F1
 
5.6%
E1
 
5.6%
M1
 
5.6%
I1
 
5.6%
ValueCountFrequency (%)
12
66.7%
31
33.3%

Most occurring scripts

ValueCountFrequency (%)
Latin18
85.7%
Common3
 
14.3%

Most frequent character per script

ValueCountFrequency (%)
O3
16.7%
X2
11.1%
L2
11.1%
A2
11.1%
T2
11.1%
H2
11.1%
F1
 
5.6%
E1
 
5.6%
M1
 
5.6%
I1
 
5.6%
ValueCountFrequency (%)
12
66.7%
31
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII21
100.0%

Most frequent character per block

ValueCountFrequency (%)
O3
14.3%
X2
9.5%
L2
9.5%
A2
9.5%
T2
9.5%
H2
9.5%
12
9.5%
F1
 
4.8%
E1
 
4.8%
31
 
4.8%
Other values (3)3
14.3%

Gene 62
Categorical

HIGH CORRELATION
MISSING
UNIFORM

Distinct3
Distinct (%)100.0%
Missing81485
Missing (%)> 99.9%
Memory size2.5 MiB
KLF10
ARID2
POU4F3

Length

Max length6
Median length5
Mean length5.333333333
Min length5

Characters and Unicode

Total characters16
Distinct characters15
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)100.0%

Sample

1st rowKLF10
2nd rowPOU4F3
3rd rowARID2
ValueCountFrequency (%)
KLF101
 
< 0.1%
ARID21
 
< 0.1%
POU4F31
 
< 0.1%
(Missing)81485
> 99.9%
Histogram of lengths of the category
ValueCountFrequency (%)
arid21
33.3%
klf101
33.3%
pou4f31
33.3%

Most occurring characters

ValueCountFrequency (%)
F2
 
12.5%
K1
 
6.2%
L1
 
6.2%
11
 
6.2%
01
 
6.2%
P1
 
6.2%
O1
 
6.2%
U1
 
6.2%
41
 
6.2%
31
 
6.2%
Other values (5)5
31.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter11
68.8%
Decimal Number5
31.2%

Most frequent character per category

ValueCountFrequency (%)
F2
18.2%
K1
9.1%
L1
9.1%
P1
9.1%
O1
9.1%
U1
9.1%
A1
9.1%
R1
9.1%
I1
9.1%
D1
9.1%
ValueCountFrequency (%)
11
20.0%
01
20.0%
41
20.0%
31
20.0%
21
20.0%

Most occurring scripts

ValueCountFrequency (%)
Latin11
68.8%
Common5
31.2%

Most frequent character per script

ValueCountFrequency (%)
F2
18.2%
K1
9.1%
L1
9.1%
P1
9.1%
O1
9.1%
U1
9.1%
A1
9.1%
R1
9.1%
I1
9.1%
D1
9.1%
ValueCountFrequency (%)
11
20.0%
01
20.0%
41
20.0%
31
20.0%
21
20.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII16
100.0%

Most frequent character per block

ValueCountFrequency (%)
F2
 
12.5%
K1
 
6.2%
L1
 
6.2%
11
 
6.2%
01
 
6.2%
P1
 
6.2%
O1
 
6.2%
U1
 
6.2%
41
 
6.2%
31
 
6.2%
Other values (5)5
31.2%

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

IndexLineGene 1Gene 2Gene 3Gene 4Gene 5Gene 6Gene 7Gene 8Gene 9Gene 10Gene 11Gene 12Gene 13Gene 14Gene 15Gene 16Gene 17Gene 18Gene 19Gene 20Gene 21Gene 22Gene 23Gene 24Gene 25Gene 26Gene 27Gene 28Gene 29Gene 30Gene 31Gene 32Gene 33Gene 34Gene 35Gene 36Gene 37Gene 38Gene 39Gene 40Gene 41Gene 42Gene 43Gene 44Gene 45Gene 46Gene 47Gene 48Gene 49Gene 50Gene 51Gene 52Gene 53Gene 54Gene 55Gene 56Gene 57Gene 58Gene 59Gene 60Gene 61Gene 62
01L1DLX2SNAI1NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
12L1GSX2SP8NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
23L1ESR2EGR1NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
34L1ARTWIST1NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
45L1ASCL1INSM1NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
56L1POU5F1DNMT1NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
67L1FOXD3SOX10NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
78L1ATF2JUNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
89L1FOXO3FOXM1NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
910L1EGR1CEBPANaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN

Last rows

IndexLineGene 1Gene 2Gene 3Gene 4Gene 5Gene 6Gene 7Gene 8Gene 9Gene 10Gene 11Gene 12Gene 13Gene 14Gene 15Gene 16Gene 17Gene 18Gene 19Gene 20Gene 21Gene 22Gene 23Gene 24Gene 25Gene 26Gene 27Gene 28Gene 29Gene 30Gene 31Gene 32Gene 33Gene 34Gene 35Gene 36Gene 37Gene 38Gene 39Gene 40Gene 41Gene 42Gene 43Gene 44Gene 45Gene 46Gene 47Gene 48Gene 49Gene 50Gene 51Gene 52Gene 53Gene 54Gene 55Gene 56Gene 57Gene 58Gene 59Gene 60Gene 61Gene 62
8147881479L58STAT5BPAX5BACH2BCL6TP53NANOGPOU5F1SALL4MYCSNAI1ZEB1ARTWIST1FOXM1SOX9RUNX2SNAI2STAT1TBX21RUNX3EGR1CEBPAPPARGFOSPRDM1MSX1PAX3MYOD1MYOGMEF2AJUNDNMT1FOXP3IRF8GATA2SPI1KLF4CEBPBKLF5SOX2STAT3RORAARNTLNFATC1SP1FOXO3FOXO1KLF2GATA4MEF2CASCL1RBPJHES5GATA3ESR1PGRMSX2ATOH1POU4F3NaNNaNNaN
8147981480L58DLX3AHRBACH2BCL6TP53NANOGPOU5F1SALL4MYCSNAI1ZEB1ARTWIST1FOXM1SOX9RUNX2SNAI2STAT1TBX21RUNX3EGR1CEBPAPPARGFOSPRDM1MSX1PAX3MYOD1MYOGMEF2AJUNDNMT1FOXP3IRF8GATA2SPI1KLF4CEBPBKLF5SOX2STAT3RORAARNTLNFATC1SP1FOXO3FOXO1KLF2GATA4MEF2CASCL1RBPJHES5GATA3ESR1PGRMSX2ATOH1POU4F3NaNNaNNaN
8148081481L58TCF3PAX5BACH2BCL6TP53NANOGPOU5F1SALL4MYCSNAI1ZEB1ARTWIST1FOXM1SOX9RUNX2SNAI2STAT1TBX21RUNX3EGR1CEBPAPPARGFOSPRDM1MSX1PAX3MYOD1MYOGMEF2AJUNDNMT1FOXP3IRF8GATA2SPI1KLF4CEBPBKLF5SOX2STAT3RORAARNTLNFATC1SP1FOXO3FOXO1KLF2GATA4MEF2CASCL1RBPJHES5GATA3ESR1PGRMSX2ATOH1ARID2NaNNaNNaN
8148181482L59TP53HMGA2TWIST1POU5F1SALL4NANOGSTAT3SNAI1ZEB1ARFOXA1TET1EGR1CEBPAPPARGFOSPRDM1XBP1SP7SATB2RUNX2SNAI2STAT1TBX21RUNX3FOXP3IRF8GATA2SPI1KLF4ONECUT1NEUROG3FOXM1SOX2FOXD3PAX3MYOD1MYOGMEF2AJUNKLF5PPARANPAS2RORAARNTLNFATC1NFATC3MYCE2F1FOXO1KLF2GATA4MEF2CASCL1RBPJHES5GATA3MAFMAFBPROX1NaNNaN
8148281483L59TP53HMGA2TWIST1POU5F1SALL4NANOGSTAT3SNAI1ZEB1ARFOXA1TET1EGR1CEBPAPPARGFOSPRDM1XBP1SP7SATB2RUNX2SNAI2STAT1TBX21RUNX3FOXP3IRF8GATA2SPI1KLF4ONECUT1NEUROG3FOXM1SOX2FOXD3PAX3MYOD1MYOGMEF2AJUNKLF5PPARANPAS2RORAARNTLNFATC1NFATC3MYCE2F1FOXO1KLF2GATA4MEF2CNR3C1RELARELIRF4BCL6MLXIPLKLF10NaNNaN
8148381484L59JUNDNMT1TP53NANOGPOU5F1SALL4MYCSNAI1ZEB1ARTWIST1FOXM1SOX9RUNX2SNAI2STAT1TBX21RUNX3EGR1CEBPAPPARGFOSPRDM1MSX1PAX3MYOD1MYOGMEF2AESRRAPPARANPAS2RORAARNTLNFATC1SP1FOXO3FOXP3IRF8GATA2SPI1KLF4CEBPBKLF5SOX2STAT3MYCNASCL1RBPJHES5GATA3NKX3-1FOXO1KLF2GATA4MEF2CNR3C1RELARELNFATC2EGR3NaNNaN
8148481485L60TP53HMGA2TWIST1POU5F1SALL4NANOGSTAT3SNAI1ZEB1ARFOXA1TET1EGR1CEBPAPPARGFOSPRDM1XBP1SP7SATB2RUNX2SNAI2STAT1TBX21RUNX3FOXP3IRF8GATA2SPI1KLF4ONECUT1NEUROG3FOXM1SOX2FOXD3PAX3MYOD1MYOGMEF2AJUNKLF5PPARANPAS2RORAARNTLNFATC1NFATC3MYCE2F1FOXO1KLF2GATA4MEF2CASCL1RBPJHES5GATA3ESR1PGRMSX2FOXE3NaN
8148581486L61JUNDNMT1TP53NANOGPOU5F1SALL4MYCSNAI1ZEB1ARTWIST1FOXM1SOX9RUNX2SNAI2STAT1TBX21RUNX3EGR1CEBPAPPARGFOSPRDM1MSX1PAX3MYOD1MYOGMEF2AESRRAPPARANPAS2RORAARNTLNFATC1SP1FOXO3FOXP3IRF8GATA2SPI1KLF4CEBPBKLF5SOX2STAT3MYCNASCL1RBPJHES5GATA3NKX3-1FOXO1KLF2GATA4MEF2CNR3C1RELARELIRF4BCL6MLXIPLKLF10
8148681487L61TP53HMGA2TWIST1POU5F1SALL4NANOGSTAT3SNAI1ZEB1ARFOXA1TET1EGR1CEBPAPPARGFOSPRDM1XBP1SP7SATB2RUNX2SNAI2STAT1TBX21RUNX3FOXP3IRF8GATA2SPI1KLF4ONECUT1NEUROG3FOXM1SOX2FOXD3PAX3MYOD1MYOGMEF2AJUNKLF5PPARANPAS2RORAARNTLNFATC1NFATC3MYCE2F1FOXO1KLF2GATA4MEF2CASCL1RBPJHES5GATA3ESR1PGRMSX2ATOH1POU4F3
8148781488L61TP53HMGA2TWIST1POU5F1SALL4NANOGSTAT3SNAI1ZEB1ARFOXA1TET1EGR1CEBPAPPARGFOSPRDM1XBP1SP7SATB2RUNX2SNAI2STAT1TBX21RUNX3FOXP3IRF8GATA2SPI1KLF4ONECUT1NEUROG3FOXM1SOX2FOXD3PAX3MYOD1MYOGMEF2AJUNKLF5PPARANPAS2RORAARNTLNFATC1NFATC3MYCE2F1FOXO1KLF2GATA4MEF2CASCL1RBPJHES5GATA3ESR1PGRMSX2ATOH1ARID2