A Guideline to grasp at:
2.1 Research Design
- Approach: Mixed-methods, combining quantitative modeling with qualitative interpretation.
- Justification: Language change is both a social and mathematical phenomenon; blending corpus linguistics, statistical modeling, and sociolinguistics allows for richer insights.
- Scope: Focus on lexical innovation in digital communication, especially words, memes, hashtags, and emojis that emerged in the last 10–15 years.
2.2 Data Collection
- Corpora Sources
- Twitter/X: rapid spread of neologisms, hashtags, memes.
- Reddit: community-specific slang, jargon, and lexical innovation.
- TikTok: viral audiovisual trends where words and hashtags spread quickly.
- Online news & dictionaries: to track when internet words cross into mainstream.
- Selection Criteria
- Time-stamped records to observe adoption and decline.
- Terms with measurable frequency increase (Google Ngram, word frequency APIs).
- Words representing different categories: slang (e.g., yeet), technological neologisms (selfie), multimodal units (emoji).
- Ethical Considerations
- Data anonymization.
- API compliance and respecting privacy policies.
2.3 Techniques of Analysis
- Corpus Linguistics Methods
- Frequency tracking: observing how often new words or expressions appear across a time span.
- Time-series analysis: identifying adoption curves (rise, peak, decline).
- Keyword comparison: examining how lexical items spread differently across online platforms.
- Model Evaluation
- Fitting observed lexical trends to established models (e.g., S-curve of diffusion).
- Assessing how well models reflect real-world data using simple accuracy checks (e.g., comparing predicted vs. actual adoption rates).
2.4 Case Studies (Applied Modeling)
- Case Study 1: Viral Internet Slang
- Example: yeet → origin, peak, decline.
- Fit adoption to logistic/S-curve model.
- Compare lifespan across platforms.
- Case Study 2: Emoji as Lexical Units
- Treat emojis as evolving lexicon.
- Analyze substitution (😂 replacing “lol”).
- Model adoption as competition between old and new forms.
- Case Study 3: Platform-Specific Spread
- Compare Twitter vs. TikTok vs. Reddit.
- Identify whether diffusion speed differs depending on platform structure.
- Analyze role of algorithms (trending lists, recommendation engines).
2.5 Limitations of Methodology
- Incomplete corpora (API restrictions).
- Noise in social media data (bots, spam).
- Oversimplification of human social behavior in models.
- Difficulty capturing semantic shifts beyond frequency counts.