Subsets ======= The Subsets tab can display the full subset of your choice for the selected corpus. Therefore, you can retrieve all quotes or all long suspensions, etc. in any of the books or pre-selected corpora for further analysis. Note that we find this option most useful for the smaller subsets, i.e. quotes and suspensions; if you select the whole 'non-quotes' subset the output may become unwieldy. .. rubric:: Show subsets :name: show-subsets Click onto the dropdown **'Show subsets'** (see :numref:`figure-analysis-subsets-show-options`) to select a relevant subset (short suspensions, long suspensions, quotes or non-quotes). You will also need to choose a corpus. .. _figure-analysis-subsets-show-options: .. figure:: ../images/figure-analysis-subsets-show-options.png The basic subset options :numref:`figure-analysis-subsets-show-longsuspensions` shows sample lines from the subset of long suspensions in *Oliver Twist*. You can then use the filter option to narrow down the lines and group them using the KWICGrouper. For subsets, the "relative frequency" is not given in terms of frequency per million words, as in the Concordance tab, but as the percentage of total words in the corpus found in the selected subset. .. _figure-analysis-subsets-show-longsuspensions: .. figure:: ../images/figure-analysis-subsets-show-longsuspensions.png The first few lines from the subset of 'long suspensions' in Oliver Twist .. rubric:: Results :name: results-1 Like in the concordance tab, this allows you to adjust the way the concordance output ('table') is displayed. .. rubric:: Filter rows :name: filter-rows-1 The filter option lets you filter the output by the rows that contain a particular sequence of letters, as described in the :ref:`Filter rows` subsection of the Concordance tab documentation. For example, you could filter suspensions for particular speech verbs like *cried* (:numref:`figure-analysis-subsets-results-filter-cried`). .. _figure-analysis-subsets-results-filter-cried: .. figure:: ../images/figure-analysis-subsets-results-filter-cried.png Filtering long suspensions in Oliver Twist for *cried* .. _figure-analysis-subsets-results-filter-cotext: .. figure:: ../images/figure-analysis-subsets-results-filter-cotext.png Filtering the co-text of long suspensions for *perhaps* in Oliver Twist Note, however, that the filter will search through the whole row and therefore also accounts for words in the context, not only in the subset itself. For example, when searching through the subset of long suspensions in *Oliver Twist* and filtering rows for *perhaps* the results originate only from the co-text, as *perhaps* does not occur in long suspensions (see :numref:`figure-analysis-subsets-results-filter-cotext`). .. rubric:: View as :name: view-as-1 Like the :ref:`View as` options for the Concordance tab, in Subsets you can view the 'Basic results' (concordance lines; book short title; link to 'in bk.' view) the 'full metadata' (+ chapter, paragraph & sentence numbers) or the 'distribution plot', which gives an overview of matching lines per book. .. _figure_distribution_plot_quotes: .. figure:: ../images/figure_distribution_plot_quotes.png The distribution plot view in the Subsets tab In the case of the Subsets tab, these lines obviously are not concordance lines, but instances of the subset e.g. a quote or non-quote element or a short/long suspension. When you then create a distribution plot of a selected subset, you will therefore see how the subset is distributed across a book or corpus. Note that this operation may take a moment to load for a large corpus. :numref:`figure_distribution_plot_quotes` gives an example of three particular books with rather distinct quote distributions: whereas *Pride and Prejudice* contains a lot of dialogue – as you can see from the white quote subsets interspersed by grey non-quotes – both *The Time Machine* and *Heart of Darkness* contain much longer quote chunks by a key character telling a story. .. rubric:: KWICGrouper :name: kwicgrouper-1 If you want to restrict your search to the subset itself, the KWICGrouper is the better option; it will also highlight your search terms, as described in the :ref:`Concordance` section. The Subset KWICGrouper works like the Concordance KWICGrouper, with the exception of its search span which operates only on the subset itself. See :numref:`figure-analysis-subsets-kwicgrouper-criedscreamedsobbed` for an illustration of the Subset KWICGrouper searching for lines with *cried*, *screamed* and *sobbed*. .. _figure-analysis-subsets-kwicgrouper-criedscreamedsobbed: .. figure:: ../images/figure-analysis-subsets-kwicgrouper-criedscreamedsobbed.png The search span of the Subset KWICGrouper applies to the subset; not to the co-text .. rubric:: Manage tag columns :name: manage-tag-columns-1 .. _figure-analysis-subsets-tagcolumns-gender: .. figure:: ../images/figure-analysis-subsets-tagcolumns-gender.png Tagging subsets – here, long suspensions in ChiLit containing *cried* are tagged for character gender Just like in the Concordance tab (see :ref:`Concordance`), subset rows can be annotated with user-defined tags. :numref:`figure-analysis-subsets-tagcolumns-gender` shows a potential application of tagging subsets: long suspensions in the 19th Century Children's Literature (ChiLit) corpus containing *cried* are tagged for whether the crying character is male or female. Note that this screenshot just illustrates the technique; it does not represent the actual gender distribution of *cried* in the ChiLit long suspensions.