This project is an investigation into the make up of the Metropolitan Museum of Art’s Photography Collection. As one of the largest art museums in the U.S., The Met’s Photography Collection contains nearly 40,000 photographs from all points throughout the medium’s history. This large scope of the collection makes it potentially valuable to study on a number of levels.
The visualizations created for this project are primarily related to the make up and distribution of various photographic processes present in the collection. The goal is that through concise and efficient visualizations, a user can learn about the variety of photographic processes present in The Met’s collection, how a processes’ use has changed over time, what processes artists have worked with, as well as how processes might be related to one another.
These visualizations have potential to be beneficial to those with an interest in gallery and museum studies, the history of photography, photography practitioners, as well as students and instructors of photography.
Process and Rationale
This project is composed of three primary visualizations each of which is designed to investigate a different aspect of photographic processes in The Met’s collection. The primary dataset for all of the visualizations was made available publicly by The Met and is a detailed listing of every object in the museum. These data were cleaned in OpenRefine and Microsoft Excel in order to eliminate all non photographic objects as well as remove any information except artist name, photographic process, and date of work. This cleaned and refined data set was then used to generate the first two visualizations. These two visualizations are described in more detail below.
Frequency Histogram Bar Chart
The primary goal of this visualization is to understand how the use of photographic processes have changed over time. For each decade beginning at the invention of photography, it shows the number of prints present in The Met’s collection for a given process. This was accomplished in Tableau Public by creating a bar chart with date of work in the columns shelf and number of prints in the rows shelf. Dates were grouped by decade and photographic process was used as a filter allowing users to select a specific process and inspect it’s frequency histogram. Because there are more than 60 distinct photographic processes present in The Met’s collection processes were grouped and limited to 15 primary processes as determined by the author’s knowledge of photography. Processes outside this primary set of 15 that were present in the collection were grouped into “other processes.”
Design for this visualization is based on the concept that a higher data-ink ratio, as defined by Edward Tufte, is ideal. Because there was only one variable needed to be displayed, color was eliminated and light grey was used for the bars. This light grey is also meant to subtly reference photographic processes, many of which are greyscale or use silver salts. Titles for the axis and the bar chart itself were initially present, but based partially on in-class feedback and reinforced by user research, they were removed in favor of a separate caption to give this information to users in plain language. Finally a mouse over is utilized to reveal the absolute number of prints in the collection for each decade/process. A mouse over was used because this information is useful, but secondary, to the primary goal of understanding photographic processes in the collection over time.
Artist Bubble Chart
Although the frequency histogram bar chart is effective in helping users understand the distribution of photographic processes in The Met’s collection over time, it has no information concerning the artists that created the works. In order to deepen user’s understanding of the collection, a bubble chart was created that displays all of the artists working in a given process. This chart was also created in Tableau Public and utilized the same filter as the frequency histogram bar chart. Bubbles are sized according to number of prints attributed to an artist for a given process, with larger bubbles representing artists with more prints present in the collection.
This chart is linked to the frequency histogram bar chart interactively by process filter, but also in regards to design. It utilizes the same grey value as the bar chart and relies on caption and mouse over information in similar ways. The mouse over information becomes more critical in this chart as many of the bubbles are too small to include artist name and number of prints, so users must rely on the mouse over method of information collection to fully appreciate the visualization. Because of this, this visualization was designed, tested, and meant to be viewed on a traditional computer, rather than a tablet or phone on which mouse overs are not possible.
In order to connect the above visualizations they were combined into a dashboard within Tableau that features them next to one another and the process filter above. Added to this dashboard was a title for both of the visualizations as well as instructions for how to navigate the filter. Initially, the process filter was located at the bottom of the dashboard, but based on user feedback it was moved to the top and centered. Users wanted it to be at the top as the first object encountered because it is the driving force of the visualizations. When shared as a link, separate pages with the single visualizations are also available to users, however this is not immediately evident so instructions were placed at the bottom of the dashboard for this. This dashboard can be viewed here.
Photographic Process Network
The goal of this visualization is to understand how photographic processes might be related to one another based the artists that work in them. In order to do this a network of all artists in The Met’s Photography Collection was created. In order to understand potential relationships among processes, artists were linked to one another if they worked in the same process.
To create this network the dataset that was used in the above Tableau visualizations was transformed in Microsoft Excel and R to create an edge table in which each artist in the collection is linked to every other artist by the process(es) in which they worked. If two artists only shared one process in common, the edge was given a weight of 1, if they shared two processes; 2 and so on. This undirected edge table was loaded into Gephi and a node table for each artist was generated based on the edge table.
Because of the size of the network (more than 2500 nodes and 1000000 edges) the network was produced using the Forced Atlas 2 layout with a gravity of 1 and scale of 2. The average degree of each node was approximately 842, meaning on average each artist is connected to 842 others via shared process. The entire graph density was 0.329 showing that there are many artists that are not linked via process that potentially could be. Modularity was generated with a radius of 1.7 resulting in 13 clusters. This was chosen so that the number of clusters formed was a close approximation to the number of major types of photographic processes present in the other visualizations. Nodes were sized based on degree and colored based on modularity class. A palette of 13 distinct colors were chosen as not to have any overlapping colors present in the graph. Initially, edges were present using blended colors, however ultimately they were removed. The edges became visually so dense as to occlude smaller nodes and clusters altogether and when exported as a PDF, yielded a solid block of color rather than distinct lines. Because the shape, size, and location of the clusters are the most important aspect of this visualization, there is only a small amount of information lost by removing the edges.
After the network graph was constructed it was necessary to understand the groupings in relation to photographic process. To do this a lookup table was generated in Microsoft Excel which listed every artist in the network and all of the processes in which they have worked. Individual nodes were inspected in Gephi and matched to the lookup table information to reveal which cluster represented a particular process. Labels were placed on the exported PDF to designate particular processes as well as a title and brief description on how to read the graph. Discovered through user feedback, the large multicolored cluster in the center of the graph was labeled as “Large Diversity of Prints” from the original label of “Albumen and Gelatin Silver Prints”, as this initial label was determined to be misleading. The artists in the center cluster are larger than others because they have more connections as they have worked in many processes; likely including, but not limited to Albumen and Gelatin Silver.
Ultimately, a static, vector based output was used for the final visualization. This allows users to absorb the entire chart at once, however also have the ability to zoom in and out without loss of quality. The PDF format was chosen so that it is easily accessible to anyone, and an interactive display was forgone as to avoid over complicating an already complex graph. This graph can be seen below.
Three users from various backgrounds were recruited to test and provide feedback on the above visualizations. Below are descriptions of each user as well as rationale for why they were chosen.
User 1 is a business Professor with a background in computer science and specialty in statistical forecasting and Microsoft Excel. This user was chosen because he has a high competency in data analysis and visualizations, however no knowledge of photography. His feedback was used to asses the accuracy and clarity with which the visualizations displayed the underlying data without particular concern for the subject matter.
User 2 is an Assistant Chair and Professor in a photography department with overall knowledge of photographic history and processes with specialty in instruction and curriculum development. She is not formally trained in any form of data analysis or visualization techniques, however is highly visually literate and design savvy. Because these visualizations could potentially be used in the classroom, her feedback was used to asses how effective they are from an instruction standpoint.
User 3 is an art history Professor and writer with expertise in photographic history. She has little to no prior experience with data analysis and visualization, but is deeply involved in photographic research and study. Her feedback was used to understand how these visualizations might aid in research endeavors or be utilized within the acedemic publishing landscape.
Methods of Research
In order to asses the visualizations each user was sent an email containing a link to the Tableau dashboard and a PDF of the network chart. Each user then had a 20-30 minute phone conversation with the author while they investigated the visualizations for the first time on their home computers. A combination of task completion-esque questions were asked (e.g. Which processes are more closely related: chromogenic and inkjet prints or chromogenic and salted paper prints?) along with more open ended questions (e.g. What can you tell me about The Met’s Photography Collection). Hypothetical content questions were asked (e.g. Would a glossary be helpful?) as were design based questions (e.g. What does your eye go to first? Where do you move your mouse?) Notes were taken as users navigated the visualizations and these findings were incorporated into design modifications and future recommendations, both outlined below.
Photographic Processes Throughout History
Based on the frequency histogram bar chart of The Met’s Photography Collection several things can be learned. Viewing the entirety of The Met’s collection, we can see that the 1880s is the decade from which the most prints have been collected by the museum. This is followed by the 1860s, 1890s, and 1970s being the next three most widely collected decades. We can also see that the collection of photography in the museum in regards to absolute number of prints has been declining since the 1970s.
Inspecting specific processes shows us that albumen silver prints are the most widely collected in regards to absolute number of prints and that more of those were produced in the 1880s than any other decade. Inkjet and instant prints are the most recent processes as none were created prior to 1950 and gelatin silver prints experienced a heyday from the 1920s through the 1970s. Each process can quickly and easily be understood in regards to the decades in which prints were produced using that process that were ultimately collected by The Met.
Photographic Processes and Artists
The bubble chart displaying the artist makeup of each photographic process in The Met’s collection reveals that throughout the entire collection Walker Evans, Unknown, Goodwin & Company, and W. Duke, Sons & Co. have the most prints in the collection. Filtering this bubble chart by specific process reveals that certain processes are disproportionally dominated by one artist. For example, Walker Evans has far more instant prints in the collection than any other artist, as is the same for Anna Atkins and cyanotypes. Based on the size of the charts we can see more artists work in gelatin silver and albumen silver than in any other process in the collection and that the majority of tintypes are from unknown artists.
Relationship Among Photographic Processes
Understanding the relationship among photographic processes present in The Met’s collection via a network graph is a complex, but potentially fruitful endeavor. Without prior knowledge of network graphs, users can be confused as to what they are looking at/for, however upon close investigation of the visualization, we can gain a number of insights into The Met’s Photography Collection.
Initially we can see that the bulk of the collection is comprised of artists working in either gelatin silver or albumen silver. From the previous visualizations we know that albumen silver has the most prints in regard to absolute number, however gelatin silver has more artists that have used it. Within these two major categories, there are not many artists that work in both, but those that do work in both (as well as many other processes) center the graph and are connected to the most other artists because of the diversity of processes in which they work.
Based on the placement of clusters in relation to one another we can surmise that chromogenic, inkjet, and silver dye bleach prints are likely more closely related to each other than albumen silver, salted paper, and daguerreotypes. This aligns with formal inspection of these prints as the former are all color processes, and the latter grayscale, as well as cross referencing them with the frequency histogram bar chart that shows chromogenic, inkjet, and silver dye bleach prints are all relatively modern processes and albumen silver prints, salted paper prints, and daguerreotypes are all much older.
We can also see that several processes are not strongly connected to others, such as instant prints and “other” processes which could indicate that there are not as many artists using those processes, and those that do tend to work exclusively in them.
In general, the more time spent with this network graph, the more thoroughly it is understood and the more information one can gain from it. A particularly instructive example for how to understand this graph comes from siting a specific artist. The purple node that has separated from the gelatin silver cluster and nestled between “other” processes and gelatin silver prints and platinum and gelatin silver prints is Adam Fuss. Fuss as worked in gelatin silver, silver dye bleach, and daguerreotypes which helps explain the position of this node.
Clarity, Usability, and Information Richness
Users unanimously found the frequency histogram bar chart and bubble chart more intuitive and user friendly. They were more familiar with these kinds of visualizations and therefore understood them more clearly. Positive feedback included users thinking the captions were in clear language, easy to understand, and helpful, and commenting that the interactivity of the filter made the user experience more fun. Constructive feedback that was given on these two visualizations was one user wanted bars for decades that had a low number of prints to be more easily visible and most users wanted a more clear visual link between the two charts. Surprisingly, when asked, no users wished to be given more information about the specific nature of the photographic processes themselves. Those that were not in the field didn’t care about the specifics of a process, and those that were familiar with the field already knew them, or wouldn’t use this platform as a way to get that information. One user made the initial mistake of interpreting the frequency histogram bar chart as date of acquisition of work rather than date of creation, however this was quickly remedied by reading the caption below the chart.
Users were generally both aesthetically interested and intellectually intimidated by the photographic processes network graph. It was not clear to anyone exactly what they were looking at initially and needed to read the description carefully and slowly multiple times. Even after reading the description, more information on how to read the graph was required for users to begin to extract more than just cursory observations. Once this was provided, they began to understand more clearly how the graph was formed and what information could be gained from it. Some confusion also arose from the coloration based on modularity class. There was a desire to tightly link color to photographic process, however because color was derived from connectivity, this was only a loose correlation between color and process, which was generally frustrating and confusing to users. Users were able to get major points of the graph on their own, however needed guidance to tease out some of the more subtle pieces of information. Overall users thought the chart “looked cool” and was mildly informative, but craved more concrete information.
Design feedback from users resulted in several small but significant alterations to initial visualizations. One user wanted the process filter in the Tableau dashboard to be the first object visually encountered which resulted in it being placed on top rather than on bottom. The first user tested wasn’t clear that he could choose each process so instructions for how to do this were added. A user questioned the lack of color and wanted this to be a way to link the bar chart and the bubble chart. It was suggested that bubbles could also function as pie charts to show the decade distribution of work for a given artist while using this same color to highlight the corresponding bars in the frequency histogram. This would be potentially useful however only when bubbles are large enough to read as pie charts.
Design feedback provided for the photographic processes network graph was minimal as most of the conversation surrounded clarity and information richness. Users found it useful to be able to zoom into the graph to examine small nodes and clusters, and thought the labels were helpful, but potentially misleading as they don’t all correlate to a specific color.
Based on the author’s as well as user’s assessment of the visualizations of The Met’s Photography Collection, there are several areas that could be improved/expanded on. In regards to the frequency histogram bar chart and bubble chart, future iterations should visually form a clear link between the two. This could be done by incorporating date information into the bubble chart via colors that correspond to colors of bars in the frequency histogram. This would expand the users knowledge by not only providing which artists used a particular process, but when this occurred in time.
In order to make the photographic processes network graph more effective and intuitive, nodes could be colored based strictly on photographic process (or combination of) rather than modularity class. This would result in more color categories, however would be more intuitive to users. Along with this an interactive version of the network could be generated so that individual artists could be chosen to reveal the processes in which they worked. Finally, generating a way in which to provide users with more detailed information on how to read and understand a network graph would likely increase user understanding and satisfaction.
Beyond the specific alterations that could be done to the current visualizations, more research and analysis could be done to enrich the project as a whole. A similar analysis of other museum collections (MOMA, Tate Modern, SFMOMA, LACMA, etc.) could be done and then contrasted against The Met’s in order to understand how different intuitional collections are related and/or biased towards certain time periods, artists, mediums, etc. Another line of inquiry would be to contrast The Met’s Photography Collection with it’s collection of other mediums (painting, sculpture, drawing, etc.) to create a more well rounded visualization of The Met as a whole, rather than just a specific subset of it’s collection.