generalize-document-model
v1.1.4
Published
Pattern matching for human driven patterns
Downloads
3
Maintainers
Readme
Generalize Document Model (GDM)
Overview
The Simplified DOM Generator is a JavaScript script that processes a web page's DOM and creates a simplified version of it, focusing on core attributes, styles, content, and event listeners. The script categorizes elements into predefined groups, such as textual, input, interactive, and more, and strips out unnecessary details to produce a cleaner, more manageable HTML structure. This is particularly useful for analyzing the structure of a web page, identifying key elements, and understanding event-driven interactions.
Features
- Categorization of Elements: Elements are categorized into groups like textual, input, interactive, button, media, image, and structural. This helps in organizing the DOM based on the type of content and its purpose.
- Style Simplification: Only a core set of CSS properties (e.g.,
display
,font-size
,color
, etc.) are retained for each element, ensuring that the output is streamlined and focuses on essential visual attributes. - Textual Content Analysis: The script identifies and categorizes textual content within elements, differentiating between regular text, complex text (e.g., numbers), and currency formats.
- Event Listener Detection: The script detects event listeners attached to elements and lists them as attributes in the simplified DOM. This feature helps in identifying interactive elements and understanding user interaction points on the page.
- Media Handling: For media elements like videos and images, the script captures relevant attributes like sources and aspect ratios.
- Structural Integrity: The script intelligently removes unnecessary structural elements that do not contain content or child nodes, further simplifying the output.
- JavaScript and CSS Filtering: The script identifies and excludes inline JavaScript and CSS content to prevent unnecessary code from cluttering the output.
How It Works
Element Categorization
Elements are categorized based on their tag names. The script includes predefined sets of tags for different categories:
- Textual Tags: Include elements like
<p>
,<span>
,<h1>
, etc. - Input Tags: Include form elements like
<input>
,<textarea>
,<select>
, etc. - Interactive Tags: Include elements like
<details>
,<summary>
, etc. - Button Tags: Include clickable elements like
<a>
,<button>
, etc. - Media Tags: Include multimedia elements like
<video>
,<audio>
, etc. - Image Tags: Include image elements like
<img>
,<svg>
, etc. - Structural Tags: Include layout elements like
<div>
,<section>
,<header>
, etc.
Style Simplification
Only essential CSS properties are retained in the output. These include display
, font-size
, color
, and a few others. This ensures that the simplified DOM focuses on the core visual characteristics of each element.
Event Listener Detection
The script uses the getEventListeners
method (available in Chrome DevTools) to detect event listeners attached to elements. If an element has one or more event listeners, an events
attribute is added to the corresponding simplified DOM element, listing the types of events (e.g., click
, keypress
).
Text Content Analysis
The script analyzes the textual content of elements, distinguishing between:
- Text: Regular text without special formatting or numerical content.
- Complex: Text that contains numbers or other complex characters.
- Currency: Text that matches currency formats.
Handling Structural Elements
Structural elements like <div>
, <section>
, and others are only retained if they contain content or have child nodes. Empty structural elements are removed to simplify the output.
Media Element Processing
For media elements, the script captures the sources (e.g., URLs of videos or images) and calculates the aspect ratio to retain important information while simplifying the rest.
JavaScript and CSS Filtering
Inline JavaScript and CSS are identified and excluded from the output, ensuring that the simplified DOM focuses purely on structure and content, without unnecessary code.
How to Use
1. Paste the Script into DevTools
Open your browser's Developer Tools (usually by pressing F12
or Ctrl+Shift+I
), navigate to the Console tab, and paste the script into the console. Press Enter to execute the script.
2. Generate the Simplified DOM
The script will process the current page's DOM and generate a simplified HTML structure. The resulting HTML will be logged in the console, which you can copy or save as needed.
3. Analyze the Output
The simplified DOM will include only the most essential elements, styles, and event listeners, making it easier to analyze the structure of the page and understand how it operates.
Example Output
For a hide button elements with minimal data
<div class="quick-product__btn js-modal-open-quick-modal-7330436644982 small--hide" aria-expanded="true">
<span class="quick-product__label">QUICK VIEW</span>
</div>
The generalize DOM might output something like this:
<structural
classes="quick-product__btn js-modal-open-quick-modal-7330436644982 small--hide"
styles="display: block; font-size: 12.75px; color: rgb(255, 255, 255); flex-direction: row; font-style: normal; font-weight: 400; text-decoration: none solid rgb(255, 255, 255);"
events="click">
<textual
classes="quick-product__label"
styles="display: inline; font-size: 12.75px; color: rgb(255, 255, 255); flex-direction: row; font-style: normal; font-weight: 400; text-decoration: none solid rgb(255, 255, 255);"
type="text">
QUICK VIEW
</textual>
</structural>
This output shows the essential classes, styles, and the presence of a click event listener, without unnecessary attributes or inline JavaScript to make idenifying and leveraging DOM patterns simpiliar and easier.
Conclusion The Simplified DOM Generator is a powerful tool for web developers and designers who need to analyze and understand the structure of complex web pages. By focusing on the essential elements, styles, and interactions, it provides a clear and concise representation of the DOM, making it easier to identify key components and interactions.