355x Filetype PDF File size 0.55 MB Source: vision.cornell.edu
TheFashionpediaOntologyandFashionSegmentationDataset
∗1 ∗1 ∗3 1,2
Menglin Jia MengyunShi Mikhail Sirotenko Yin Cui
1 1 1,2
Bharath Hariharan Claire Cardie Serge Belongie
1Cornell University 2Cornell Tech 3Google AI
Abstract
As a step toward mapping out the visual aspects of the
fashion world, we introduce the Fashionpedia ontology and
fashion segmentation dataset. The Fashionpedia consists of
two parts: (1) an ontology built by fashion experts contain-
ing 27 main apparel objects, 19 apparel parts, and 92 fine- (a) (b) (c)
grained attributes and their relationships and (2) a dataset
consisting of everyday and celebrity event fashion images Ensemble
annotated with segmentation masks and their associated
fine-grainedattributes, built upon the backbone of the Fash-
ionpediaontologystructure. Theaimofourworkistoculti-
vate research connections between the computer vision and Shoe
fashion communities through the creation of a high quality Glasses Above-the- Shoe
hip length
dataset and associated open competitions, thereby advanc- Bag Plain
Dropped
ingthestate-of-the-artinfine-grainedvisualrecognitionfor Washed shoulder Ankle length
Jacket Single- Fly (Opening)
fashion and apparel. Plain breasted Plain
Slim (fit)
Regular Regular Pants Symmetrical
(fit) (fit) Tops
Normal
1. Introduction Symmetrical Waist
Distressed
Above-the-hip
Fashion, in its various forms, influences many aspects of length
modernsocieties, having a strong financial and cultural im-
pact. Recent breakthroughs in the field of computer vision
have given rise to increased interest in the visual analysis Collar Pocket
of fashion components. A key component in these recent Sleeve Pocket
technological advances is the availability of large amounts Sleeve Pocket Neckline
of annotated training data of high-quality. Evidence of this Relationships: Part of
can be seen in the engagement of the community in the Textile Finishing Textile Pattern Silhouette Opening Type
COCOobject recognition dataset [14] and associated chal- Length Nickname Waistline
(d)
lenges that have run annually from 2015 to present. One Figure 1. Overview of the Fashionpedia dataset: (a) The original
area that remains challenging for computers, however, is image; (b) The image with main garment segmentation masks; (c)
fine-grained visual recognition. Theimagewithbothmaingarmentandgarmentpartsegmentation
Recently, wehaveobservedanincreasingefforttocurate masks; (d) An exploded view of the annotation diagram: the im-
datasets for fine-grained visual recognition, evolved from age is annotated with both segmentation masks and fine-grained
Caltech-UCSD Birds dataset [22] to the recent iNaturalist attributes (black boxes)
species classification and detection dataset [20]. The goal
of this line of work is to advance the state-of-the-art in au-
tomaticimageclassificationforlargenumbersofrealworld, fine-grained categories. What is missing for these datasets,
∗equal contribution however, is the capability of providing a structured repre-
1
sentation of an image. becausetheannotationsarecollectedbycrawlingfash-
Anunderstanding of the fashion world requires that we ion product images associated with attribute-level de-
complement computers’ ability to not only detect objects scriptions directly from large online shopping web-
and attributes but also understand the relationships and in- sites. Unlike these datasets, the fine-grained attributes
teractions between them. In light of this, we introduce the of our datasets are annotated manually by fashion ex-
Fashionpedia ontology and image dataset with the aim of perts. Furthermore, to the best of our knowledge, our
training and benchmarking the computer vision models for dataset is the first one annotated with localized at-
a more comprehensive understanding of fashion. tributes – fashion experts are asked to annotate the
Thecontributions of this work are: fine-grained attributes associated with the segmenta-
• A fashion ontology informed by product descriptions tion masks labeled by the crowdworkers. Localized
fromtheinternetandbuiltbyfashionexperts. Ouruni- attributes could potentially help computational models
fiedontologycapturesthecomplexstructureoffashion detect and understand attributes more accurately.
objects and ambiguity in descriptions obtained from • Fine categorization: Previous study on the attribute
the web, containing 46 apparel objects (27 main ap- categorization suffers from several issues including:
parel objects and 19 apparel parts), and 92 fine-grained (1) repeated attributes belonging to the same category
attributes in total. (e.g., zip, zipped and zipper) [15, 8]; (2) only contain-
• A dataset with a total of around 50K clothing im- ing basic level categorization (object recognition) and
ages in daily-life, celebrity events, and online shop- lack of fine categorization (or “subordinate categoriza-
ping annotated by both crowd workers for segmen- tion”) [5, 28, 11, 21, 25, 24, 12, 18, 2, 19, 10, 6, 23].
tation masks and fashion experts for fine-grained at- (3) Lack of fashion taxonomies with the needs of real-
tributes. The current version of the dataset has 10K world applications for the fashion industry, possibly
imageslabeledwithbothsegmentationmasksandfine- due to the research gap in fashion design and com-
grained attributes, and the rest 40K labeled with seg- puter vision. To better facilitate research in the areas
mentation masks only. of fashion and computer vision, our proposed ontology
• We introduce a novel fine-grained segmentation task is built and verified by fashion experts based on four
and the associated competition 1 by joining forces be- sources: (1) world-leading e-commerce fashion web-
tween the fashion and computer vision communities. sites (e.g., ZARA, H&M, Gap, Uniqlo, Forever21);
The proposed task unifies visual categorization and (2) luxury fashion brands (e.g., Prada, Chanel, Gucci);
segmentation of rich apparel attributes, which we be- (3) trend forecasting companies(e.g., WGSN);(4)aca-
lieve is an important step toward structural understand- demic resources [4, 1].
ing of fashion in real-world applications.
2. Related Work
Table 1 summarizes the comparison among different 3. Dataset Specification and Collection
datasets with clothing category and attribute labels. Our
dataset distinguishes itself in the following three aspects:
• Exhaustive annotation of segmentation masks: Ex- 3.1. Fashionpediaontologyanddatarepresentation
isting fashion datasets [5, 28] offer segmentation
masks for the main garment (e.g., jacket, coat, dress) The Fashionpedia ontology relies on the notions of ob-
and the accessory categories (e.g., bag, shoe). The ject (similar to “item” in Wikidata and “object” in Visual
smallergarmentobjectssuchascollarsandpocketsare Genome [13]) and statement. Objects represent common
not annotated. However, these small objects could be items in apparels. Statements describe detailed character-
valuable for the real world applications such as search- istics of an object and consist of a relationship (similar to
ing for a specific collar shape during online-shopping. “property” in Wikidata) and an attribute (similar to “value”
Ourdatasetsarenotonlyannotatedwiththesegmenta- in Wikidata). For example, we can add a relationship to
tion masks for a total of 27 main garments and acces- specify the silhouette of a garment by associating an at-
sory categories, but also 19 garment parts (e.g., collar, tribute for the garment silhouette; or we can assign a ma-
sleeve, pocket, zipper, embroidery). terial type relationship to a button object by specifying a
• Localizedattributes: Thefine-grainedattributesfrom material attribute. In this section, we break down each com-
existing datasets [15, 9, 27] tend to be noisy, mainly ponent of the Fashionpedia ontology (Figure 2) and explain
1Kaggle competition website: https://www.kaggle.com/c/ how a large-scale fashion ontology can be built upon the
imaterialist-fashion-2019-FGVC6 backbone of the Fashionpedia ontology structure.
2
Name Apparel Category Annotation Type Fine-Grained Attribute Annotation Type
Classification BBox Segmentation Unlocalized Localized Fine Categorization
UpsandDowns[7] MG
Fashion550k [10] MG,A
Fashion-MNIST[23] MG
Clothing Parsing [25] MG,A
Chic or Social [24] MG,A
Hipster [12] MG,A,S
Runway2Realway[21] MG,A
ModaNet[28] MG,A MG,A
Deepfashion2 [5] MG MG
UTZappos50K[26] A X
Fashion200K [6] MG X
Fashion Style-128 Floats [18] S X
Fashion144k [17] MG,A X
FashionStyle14 [19] S X
MainProduct Detection [27] MG X
StreetStyle-27K [16] X X
UT-latent look [8] MG,S X X
FashionAI [3] MG,GP,A X X
Apparel classification-Style [2] MG X X
DARN[9] MG X X
WTBI[11] MG,A X X
Deepfashion [15] S MG X X
Fashionpedia MG,GP,A MG,GP,A X X
Table 1. Comparison of Fashion Datasets (MG = Main Garment, GP = Garment Part, A = Accessory, S = Style).
shoe
buckle napoleon (lapel) types such as jacket, dress, pants are considered as main
garments. These garments also consist of several garment
belt lapel parts such as collars, sleeves, pockets, buttons, and embroi-
khakitrench (coat)
epaulette double breasted deries. Main garments are divided into three main cate-
gories: outerwear, intimate and accessories. Garment parts
set-in sleeve coat regular (collar)
elbow-length shirt, blouse straight collar also have different types: garment main parts (e.g., collars,
dropped-shoulder sleeve micro (length)
sleeve stripe
single breasted sleeves), bra parts, closures (e.g., button, zipper) and deco-
lining
skirt trucker (jacket)
wrist-length regular (fit) rations (e.g., embroidery, ruffle). In the current version of
knee (length) jacket patch (pocket)
hood hip (length) slash (pocket)
tank (top) short (length) plain (pattern)
three quarter (length) pocket curved (pocket) Fashionpedia, each image consists of an average of 1 per-
halter (top) symmetrical
classic (t-shirt) top, t-shirt, sweatshirt above-the-knee (length) flap (pocket)
abstract son, 3 main garments, 3 accessories,and 12 garment parts,
fleecy denim
hoodie normal waist distressed
velvet, velveteen, velour loose (fit)
printed dress low waist each delineated by a tight segmentation mask (Figure 1 (b-
sleeveless tight (fit)
jersey fit and flare gown
floral shirt (dress) floor (length) pants peg c)). Furthermore, each object is canonicalized to a synset
a-linesmocking tulle
circle satin halter (dress) fly (opening)
plastic
empire waistline high low maxi (length)
asymmetrical flower sweatpants IDinourFashionpedia ontology (Figure 2).
sheath (dress)
gauze paisley culottes
gathering chiffon
mini (length) wide leg
straight across (neck) jeans
scoop (neck) zipper
neckline
round (neck)
sweetheart (neckline)
u-neck
crew (neck)
turtle (neck)
plunging (neckline)
high (neck) 3.1.2 Fine-grained attributes
Figure 2. The visualization of the Fashionpedia ontology (based Each main garment and garment part were associated
on20imagesamples). with apparel attributes (Figure 1 (d)). For example, “but-
ton” is the part of the main garment “jacket”; “Jacket” can
3.1.1 Main garments, and garment parts, accessories be linked with the silhouette attribute “symmetrical”; Gar-
andtheir segmentation masks ment part “button” could contain attribute “metal” with re-
lationship of material. Each image in Fashionpedia has an
In the Fashionpedia dataset, all images were annotated average of 16 attributes. As with main garments and gar-
withmaingarmentsandeachmaingarmentwerealsoanno- ment parts, we canonicalize all attributes to our Fashionpe-
tated with its garment parts. For example, general garment dia ontology.
3
3.1.3 Relationships edge, Fashionpedia is the first dataset that combines part-
There are three main types of relationships: 1) outfits to level segmentation with fine-grained attributes. The ex-
maingarments,maingarmentstogarmentparts: meronymy pected outcome of this project is to advance the state-of-
(part-of) relationship (Figure 1 (d)); 2) main garments or the-art in domain-specific fine-grained visual recognition.
garment parts to attributes: these relationships types can We expect our Fashionpedia image dataset and its associ-
be garment silhouette (e.g., peplum), collar nickname (e.g., ated ontology will have applicability to many applications
peter pan collars), textile type (e.g., lace), textile finishing includingbetterproductrecommendationforusersinonline
(e.g., distressed), or textile-fabric patterns (e.g., paisley), shopping, enhancedvisualsearchresults, andresolvingam-
etc.; 3) within garments, garment parts or attributes: there biguousfashion-related words for textual query. Finally, we
are a maximumoffourlevelsofHyponymy(is-an-instance- expect that our work will act as a catalyst for increased at-
of) relationships. For example, weft knit is an instance of tention to domain-specific ontology for fashion by joining
knit fabric, and fleece is an instance of weft knit. forces between the fashion, computer vision, and natural
language processing communities.
3.1.4 Apparel graphs 5. Acknowledgements
Integrating the main garments, garment parts, attributes We thank Kavita Bala, Carla Gomes, Dustin Hwang,
and relationships, we create an apparel graph representa- Rohun Tripathi, Omid Poursaeed, Hector Liu, and
tion for each outfit in an image. Each apparel graph is Nayanathara Palanivel for their helpful feedback and dis-
a structured representation of an outfit ensemble, contain- cussion in the development of Fashionpedia dataset. We
ing certain types of garments. Nodes in the graph repre- also thank Zeqi Gu, Fisher Yu, Wenqi Xian, Chao Suo, Jun-
sent main garments, garment parts, and attributes. Main wenBai, Paul Upchurch, Anmol Kabra, and Brendan Rap-
garments and garment parts are linked to their respective pazzofortheirhelpdevelopingthefine-grainedattributean-
attributes through different types of relationship. The re- notation tool.
lationships connecting garment objects and attributes point
from the main garments to the attributes and from the gar- References
ment parts to their corresponding attributes. (Figure 1 (d))
illustrates one example of the apparel graph for jacket. [1] Bloomsbury.com. Fashion photography archive. Retrieved
May 9, 2019 from https://www.bloomsbury.
3.1.5 Fashionpedia ontology com/dr/digital-resources/products/
fashion-photography-archive/. 2
While apparel graphs are localized representations of [2] L. Bossard, M. Dantone, C. Leistner, C. Wengert, T. Quack,
certain outfit ensembles in fashion images, we also create and L. Van Gool. Apparel classification with style. In Com-
a single Fashionpedia ontology (Figure 2). The Fashionpe- puter Vision – ACCV 2012, pages 321–335, Berlin, Heidel-
dia ontology is the union of all apparel graphs and contains berg, 2013. Springer Berlin Heidelberg. 2, 3
entire main garments, garment parts, attributes, and rela- [3] FashionAI. Retrieved May 9, 2019 from http://
tionships. By doing so, we are able to combine multiple fashionai.alibaba.com/. 3
levels of information in a more coherent way.
[4] Fashionary.org. Fashionpedia - the visual dictionary of
3.2. Images Collection fashion design. Retrieved May 9, 2019 from https://
Atotal of 48827 images were harvested from Flickr and fashionary.org/products/fashionpedia. 2
thefreelicensephotowebsites(Unsplash,BurstbyShopify, [5] Y. Ge, R. Zhang, L. Wu, X. Wang, X. Tang, and P. Luo.
Freestocks, Kaboompics, and Pexels). Two fashion experts DeepFashion2: A Versatile Benchmark for Detection, Pose
were asked to verify the quality of the collected images Estimation, Segmentation and Re-Identification of Cloth-
manually. The annotation process consist of two phases, ing Images. arXiv:1901.07973 [cs], Jan. 2019. arXiv:
firstly, segmentation masks with apparel objects were anno- 1901.07973. 2, 3
tated by crowd workers. Secondly, 15 fashion experts were [6] X. Han, Z. Wu, P. X. Huang, X. Zhang, M. Zhu, Y. Li,
recruited to annotate the fine grained attributes for the seg- Y. Zhao, and L. S. Davis. Automatic spatially-aware fash-
mentation masks labeled at the first stage. ion concept discovery. In ICCV, 2017. 2, 3
[7] R. He and J. McAuley. Ups and Downs: Modeling the Vi-
4. Conclusion sual Evolution of Fashion Trends with One-Class Collabora-
tive Filtering. Proceedings of the 25th International Confer-
In this work, we propose the Fashionpedia ontology and ence on World Wide Web - WWW ’16, pages 507–517, 2016.
fashion segmentation dataset. To the best of our knowl- arXiv: 1602.01585. 3
4
no reviews yet
Please Login to review.