{"id":3944,"date":"2018-03-07T18:04:01","date_gmt":"2018-03-07T17:04:01","guid":{"rendered":"https:\/\/mastercaweb.u-strasbg.fr\/?p=3944"},"modified":"2018-03-07T18:04:01","modified_gmt":"2018-03-07T17:04:01","slug":"word-vectors-nlp","status":"publish","type":"post","link":"https:\/\/mastercaweb.unistra.fr\/en\/actualites\/un-categorized\/word-vectors-nlp\/","title":{"rendered":"Word Vectors: The Foundation of Natural Language Processing (NLP)"},"content":{"rendered":"<p><strong>Deep learning<\/strong> has myriad applications. However, in the context of technical communication, <strong>natural language processing<\/strong> (<strong>NLP<\/strong>) is the most relevant, since it is powering the newest generations of machine translation, sentiment analysis, and voice synthesis. \u00a0Let&#8217;s have a look at the topic of NLP. I was first introduced to it during a fascinating lecture by Dr Fran\u00e7ois Massion, at the University of Strasbourg, in 2017.<\/p>\n<h2>Word vectors &#8211; the foundation of NLP<\/h2>\n<p>Making linguistic data available to deep neural networks is the first challenge in any application of NLP. Here NLP practitioners often turn word vectors. A blog article entitled \u201c<a href=\"https:\/\/blog.acolyer.org\/2016\/04\/21\/the-amazing-power-of-word-vectors\">The amazing power of word vectors<\/a>\u201d, helped me make sense of some of the more opaquely mathematical concepts that cannot be omitted for the sake of understanding.<br \/>\nThe following image represents a highly simplified example of a typical word vector.<br \/>\n<img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-3949 aligncenter\" src=\"https:\/\/mastercaweb.u-strasbg.fr\/wp-content\/uploads\/2018\/02\/cat_words.png\" alt=\"nlp word vectors\" width=\"200\" height=\"200\" \/><br \/>\nThe word vector representing the word cat consists of eight variables. Each variable carries some aspect of the total meaning of the word. This indicates how complex the concept of meaning really is, and circumventing sticky philosophical debates is something NLP excels at. It does so by quantifying meaning and dealing with similarity and differences in terms of geometrical relationships.<\/p>\n<h2>Leveraging high vector spaces in NLP<\/h2>\n<p>In practice, most words can be transformed into word vectors consisting of anything between twenty-five and three hundred variables. Vectors of this type are often referred to as \u201cmultidimensional\u201d, with each variable representing one additional dimension of meaning. A vector consisting out of three hundred variables is therefore a three hundred-dimensional vector, and for high-level NPL applications, word vectors can have up to a thousand dimensions.<\/p>\n<p>To facilitate large vector calculations, individual variables are often given small float, or decimal, values to optimize processing resources and facilitate probability calculations. The word vector representing cat may look something like the following image in practice.<br \/>\n<img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-3951 aligncenter\" src=\"https:\/\/mastercaweb.u-strasbg.fr\/wp-content\/uploads\/2018\/02\/cat_float.png\" alt=\"nlp word vector float\" width=\"200\" height=\"200\" \/><br \/>\nSome machine learning operations specify initial conditions, or predefined values for variables, however, some of the most effective unsupervised deep learning structures, most notably deep neural networks, start iterations with random variable values, eventually narrowing down the values to within an acceptable range.<\/p>\n<p>In general, the higher the dimensional space a word vector occupies, the easier it becomes to find similarities between words. This is why many applications of NLP involve statistical vector operations in high dimensional vector spaces. This is especially true for older, support vector techniques, however, the same basis still holds for deep neural networks.<\/p>\n<h2>NLP is basically stats&#8230; on steroids<\/h2>\n<p>Given that human beings have great difficulty in visually representing objects occupying more than three dimensions, it is not surprising that similarities between vectors are often depicted in graphs of no more than two or three dimensions. When machine learning techniques are applied to sets of word vectors in successive iterations, the spaces between these vectors decrease, and words start congregating in clusters. Advanced statistical techniques can the be used to establish relationships between specific word vectors, based on the geometrical relationships between them, which may include distances and angles.<\/p>\n<p>It\u2019s exactly these statistical techniques that <a href=\"https:\/\/www.deepl.com\/translator\">DeepL<\/a> and <a href=\"https:\/\/translate.google.com\">Google<\/a> Translate are using to find similarities between words from different languages, thus making machine translation possible. This is where the real magic happens, and exact techniques are often closely guarded; however, the principles behind them are on the public domain. I found the <a href=\"http:\/\/cs224d.stanford.edu\">University of Stanford\u2019s Deep Learning for Natural Language Processing<\/a> course particularly insightful. Please note that the course is very technical in nature, so I would only recommend it to those who are serious about NLP.<\/p>\n<p>It is worth pointing out that word vectors can be extended to incorporate whole clauses, sentences, or even paragraphs. The larger the sample set, the more complex the behaviors observed, including sentence level grammatical relationships.<br \/>\nThe topics of machine learning and NLP are exhaustive and far, far beyond the scope of a single blog post. More posts on machine learning are set to appear in the months ahead, so stay tuned.<\/p>\n<p><a href=\"https:\/\/www.linkedin.com\/in\/willem-beckmann\/\" target=\"_blank\" rel=\"noopener noreferrer\"><span style=\"font-weight: 400;\">Written by Willem Beckmann<\/span><\/a><\/p>\n<h3>Sources:<\/h3>\n<ul>\n<li><a href=\"https:\/\/blog.acolyer.org\/2016\/04\/21\/the-amazing-power-of-word-vectors\">https:\/\/blog.acolyer.org\/2016\/04\/21\/the-amazing-power-of-word-vectors<\/a><\/li>\n<li><a href=\"http:\/\/cs224d.stanford.edu\">http:\/\/cs224d.stanford.edu<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Deep learning has myriad applications. However, in the context of technical communication, natural language processing (NLP) is the most relevant, [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":3985,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_themeisle_gutenberg_block_has_review":false,"footnotes":""},"categories":[],"tags":[],"class_list":["post-3944","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.6 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Word Vectors: The Foundation of Natural Language Processing (NLP)<\/title>\n<meta name=\"description\" content=\"Natural language processing has become a hot topic in recent years. In this article, I discuss the concept of word vectors, and its importance to NLP.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/mastercaweb.unistra.fr\/actualites\/translation-localization\/word-vectors-nlp\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Word Vectors: The Foundation of Natural Language Processing (NLP)\" \/>\n<meta property=\"og:description\" content=\"Natural language processing has become a hot topic in recent years. In this article, I discuss the concept of word vectors, and its importance to NLP.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/mastercaweb.unistra.fr\/actualites\/translation-localization\/word-vectors-nlp\/\" \/>\n<meta property=\"og:site_name\" content=\"Master CAWEB\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/master.caweb\" \/>\n<meta property=\"article:published_time\" content=\"2018-03-07T17:04:01+00:00\" \/>\n<meta name=\"author\" content=\"cawebinte1\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@mastercaweb\" \/>\n<meta name=\"twitter:site\" content=\"@mastercaweb\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"cawebinte1\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/mastercaweb.unistra.fr\\\/actualites\\\/translation-localization\\\/word-vectors-nlp\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/mastercaweb.unistra.fr\\\/actualites\\\/translation-localization\\\/word-vectors-nlp\\\/\"},\"author\":{\"name\":\"cawebinte1\",\"@id\":\"https:\\\/\\\/mastercaweb.unistra.fr\\\/#\\\/schema\\\/person\\\/431b92909694c397fc8112e99e2ef4aa\"},\"headline\":\"Word Vectors: The Foundation of Natural Language Processing (NLP)\",\"datePublished\":\"2018-03-07T17:04:01+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/mastercaweb.unistra.fr\\\/actualites\\\/translation-localization\\\/word-vectors-nlp\\\/\"},\"wordCount\":688,\"publisher\":{\"@id\":\"https:\\\/\\\/mastercaweb.unistra.fr\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/mastercaweb.unistra.fr\\\/actualites\\\/translation-localization\\\/word-vectors-nlp\\\/#primaryimage\"},\"thumbnailUrl\":\"\",\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/mastercaweb.unistra.fr\\\/actualites\\\/translation-localization\\\/word-vectors-nlp\\\/\",\"url\":\"https:\\\/\\\/mastercaweb.unistra.fr\\\/actualites\\\/translation-localization\\\/word-vectors-nlp\\\/\",\"name\":\"Word Vectors: The Foundation of Natural Language Processing (NLP)\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/mastercaweb.unistra.fr\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/mastercaweb.unistra.fr\\\/actualites\\\/translation-localization\\\/word-vectors-nlp\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/mastercaweb.unistra.fr\\\/actualites\\\/translation-localization\\\/word-vectors-nlp\\\/#primaryimage\"},\"thumbnailUrl\":\"\",\"datePublished\":\"2018-03-07T17:04:01+00:00\",\"description\":\"Natural language processing has become a hot topic in recent years. In this article, I discuss the concept of word vectors, and its importance to NLP.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/mastercaweb.unistra.fr\\\/actualites\\\/translation-localization\\\/word-vectors-nlp\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/mastercaweb.unistra.fr\\\/actualites\\\/translation-localization\\\/word-vectors-nlp\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/mastercaweb.unistra.fr\\\/actualites\\\/translation-localization\\\/word-vectors-nlp\\\/#primaryimage\",\"url\":\"\",\"contentUrl\":\"\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/mastercaweb.unistra.fr\\\/actualites\\\/translation-localization\\\/word-vectors-nlp\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Accueil\",\"item\":\"https:\\\/\\\/mastercaweb.unistra.fr\\\/en\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Word Vectors: The Foundation of Natural Language Processing (NLP)\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/mastercaweb.unistra.fr\\\/#website\",\"url\":\"https:\\\/\\\/mastercaweb.unistra.fr\\\/\",\"name\":\"Master CAWEB\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\\\/\\\/mastercaweb.unistra.fr\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/mastercaweb.unistra.fr\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/mastercaweb.unistra.fr\\\/#organization\",\"name\":\"Master CAWEB\",\"url\":\"https:\\\/\\\/mastercaweb.unistra.fr\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/mastercaweb.unistra.fr\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/mastercaweb.unistra.fr\\\/wp-content\\\/uploads\\\/2024\\\/03\\\/logo-caweb.webp\",\"contentUrl\":\"https:\\\/\\\/mastercaweb.unistra.fr\\\/wp-content\\\/uploads\\\/2024\\\/03\\\/logo-caweb.webp\",\"width\":351,\"height\":100,\"caption\":\"Master CAWEB\"},\"image\":{\"@id\":\"https:\\\/\\\/mastercaweb.unistra.fr\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/master.caweb\",\"https:\\\/\\\/x.com\\\/mastercaweb\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/mastercaweb.unistra.fr\\\/#\\\/schema\\\/person\\\/431b92909694c397fc8112e99e2ef4aa\",\"name\":\"cawebinte1\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/5e4d7477db19aae8bc90c90565ae900f5ad6cb035ef4337cae03a3962f43935d?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/5e4d7477db19aae8bc90c90565ae900f5ad6cb035ef4337cae03a3962f43935d?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/5e4d7477db19aae8bc90c90565ae900f5ad6cb035ef4337cae03a3962f43935d?s=96&d=mm&r=g\",\"caption\":\"cawebinte1\"},\"sameAs\":[\"https:\\\/\\\/mastercaweb.unistra.fr\"],\"url\":\"https:\\\/\\\/mastercaweb.unistra.fr\\\/en\\\/actualites\\\/author\\\/cawebinte1\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Word Vectors: The Foundation of Natural Language Processing (NLP)","description":"Natural language processing has become a hot topic in recent years. In this article, I discuss the concept of word vectors, and its importance to NLP.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/mastercaweb.unistra.fr\/actualites\/translation-localization\/word-vectors-nlp\/","og_locale":"en_US","og_type":"article","og_title":"Word Vectors: The Foundation of Natural Language Processing (NLP)","og_description":"Natural language processing has become a hot topic in recent years. In this article, I discuss the concept of word vectors, and its importance to NLP.","og_url":"https:\/\/mastercaweb.unistra.fr\/actualites\/translation-localization\/word-vectors-nlp\/","og_site_name":"Master CAWEB","article_publisher":"https:\/\/www.facebook.com\/master.caweb","article_published_time":"2018-03-07T17:04:01+00:00","author":"cawebinte1","twitter_card":"summary_large_image","twitter_creator":"@mastercaweb","twitter_site":"@mastercaweb","twitter_misc":{"Written by":"cawebinte1","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/mastercaweb.unistra.fr\/actualites\/translation-localization\/word-vectors-nlp\/#article","isPartOf":{"@id":"https:\/\/mastercaweb.unistra.fr\/actualites\/translation-localization\/word-vectors-nlp\/"},"author":{"name":"cawebinte1","@id":"https:\/\/mastercaweb.unistra.fr\/#\/schema\/person\/431b92909694c397fc8112e99e2ef4aa"},"headline":"Word Vectors: The Foundation of Natural Language Processing (NLP)","datePublished":"2018-03-07T17:04:01+00:00","mainEntityOfPage":{"@id":"https:\/\/mastercaweb.unistra.fr\/actualites\/translation-localization\/word-vectors-nlp\/"},"wordCount":688,"publisher":{"@id":"https:\/\/mastercaweb.unistra.fr\/#organization"},"image":{"@id":"https:\/\/mastercaweb.unistra.fr\/actualites\/translation-localization\/word-vectors-nlp\/#primaryimage"},"thumbnailUrl":"","inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/mastercaweb.unistra.fr\/actualites\/translation-localization\/word-vectors-nlp\/","url":"https:\/\/mastercaweb.unistra.fr\/actualites\/translation-localization\/word-vectors-nlp\/","name":"Word Vectors: The Foundation of Natural Language Processing (NLP)","isPartOf":{"@id":"https:\/\/mastercaweb.unistra.fr\/#website"},"primaryImageOfPage":{"@id":"https:\/\/mastercaweb.unistra.fr\/actualites\/translation-localization\/word-vectors-nlp\/#primaryimage"},"image":{"@id":"https:\/\/mastercaweb.unistra.fr\/actualites\/translation-localization\/word-vectors-nlp\/#primaryimage"},"thumbnailUrl":"","datePublished":"2018-03-07T17:04:01+00:00","description":"Natural language processing has become a hot topic in recent years. In this article, I discuss the concept of word vectors, and its importance to NLP.","breadcrumb":{"@id":"https:\/\/mastercaweb.unistra.fr\/actualites\/translation-localization\/word-vectors-nlp\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/mastercaweb.unistra.fr\/actualites\/translation-localization\/word-vectors-nlp\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/mastercaweb.unistra.fr\/actualites\/translation-localization\/word-vectors-nlp\/#primaryimage","url":"","contentUrl":""},{"@type":"BreadcrumbList","@id":"https:\/\/mastercaweb.unistra.fr\/actualites\/translation-localization\/word-vectors-nlp\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Accueil","item":"https:\/\/mastercaweb.unistra.fr\/en\/"},{"@type":"ListItem","position":2,"name":"Word Vectors: The Foundation of Natural Language Processing (NLP)"}]},{"@type":"WebSite","@id":"https:\/\/mastercaweb.unistra.fr\/#website","url":"https:\/\/mastercaweb.unistra.fr\/","name":"Master CAWEB","description":"","publisher":{"@id":"https:\/\/mastercaweb.unistra.fr\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/mastercaweb.unistra.fr\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/mastercaweb.unistra.fr\/#organization","name":"Master CAWEB","url":"https:\/\/mastercaweb.unistra.fr\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/mastercaweb.unistra.fr\/#\/schema\/logo\/image\/","url":"https:\/\/mastercaweb.unistra.fr\/wp-content\/uploads\/2024\/03\/logo-caweb.webp","contentUrl":"https:\/\/mastercaweb.unistra.fr\/wp-content\/uploads\/2024\/03\/logo-caweb.webp","width":351,"height":100,"caption":"Master CAWEB"},"image":{"@id":"https:\/\/mastercaweb.unistra.fr\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/master.caweb","https:\/\/x.com\/mastercaweb"]},{"@type":"Person","@id":"https:\/\/mastercaweb.unistra.fr\/#\/schema\/person\/431b92909694c397fc8112e99e2ef4aa","name":"cawebinte1","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/5e4d7477db19aae8bc90c90565ae900f5ad6cb035ef4337cae03a3962f43935d?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/5e4d7477db19aae8bc90c90565ae900f5ad6cb035ef4337cae03a3962f43935d?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/5e4d7477db19aae8bc90c90565ae900f5ad6cb035ef4337cae03a3962f43935d?s=96&d=mm&r=g","caption":"cawebinte1"},"sameAs":["https:\/\/mastercaweb.unistra.fr"],"url":"https:\/\/mastercaweb.unistra.fr\/en\/actualites\/author\/cawebinte1\/"}]}},"_links":{"self":[{"href":"https:\/\/mastercaweb.unistra.fr\/en\/wp-json\/wp\/v2\/posts\/3944","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mastercaweb.unistra.fr\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mastercaweb.unistra.fr\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mastercaweb.unistra.fr\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mastercaweb.unistra.fr\/en\/wp-json\/wp\/v2\/comments?post=3944"}],"version-history":[{"count":0,"href":"https:\/\/mastercaweb.unistra.fr\/en\/wp-json\/wp\/v2\/posts\/3944\/revisions"}],"wp:attachment":[{"href":"https:\/\/mastercaweb.unistra.fr\/en\/wp-json\/wp\/v2\/media?parent=3944"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mastercaweb.unistra.fr\/en\/wp-json\/wp\/v2\/categories?post=3944"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mastercaweb.unistra.fr\/en\/wp-json\/wp\/v2\/tags?post=3944"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}