tag:blogger.com,1999:blog-2889416825250254881.post2072781516592907992..comments2024-01-27T11:41:32.146+00:00Comments on Byte Rot: Five crazy abstractions my Deep Learning word2vec model just didaliostadhttp://www.blogger.com/profile/05695786967974402749noreply@blogger.comBlogger16125tag:blogger.com,1999:blog-2889416825250254881.post-79651504183032272242018-01-19T05:34:14.721+00:002018-01-19T05:34:14.721+00:00really awesome blog
hr interview questions
hibern...really awesome blog<br /><a href="https://www.iteanz.com/hr-interview-questions" rel="nofollow">hr interview questions</a> <br /><a href="https://www.iteanz.com/hibernate-interview-questions" rel="nofollow">hibernate interview questions</a> <br /><a href="https://www.iteanz.com/selenium-interview-questions" rel="nofollow">selenium interview questions</a> <br /><a href="https://www.iteanz.com/c-interview-questions" rel="nofollow">c interview questions</a> <br /><a href="https://www.iteanz.com/c-interview-questions-3" rel="nofollow">c++ interview questions</a> <a href="https://www.iteanz.com/linux-interview-questions" rel="nofollow">linux interview questions</a> <br />Anonymoushttps://www.blogger.com/profile/13645079578744176133noreply@blogger.comtag:blogger.com,1999:blog-2889416825250254881.post-17424415587470090012017-09-19T11:10:59.519+01:002017-09-19T11:10:59.519+01:00Hi Dear,
i Like Your Blog Very Much..I see Daily ...Hi Dear,<br /><br />i Like Your Blog Very Much..I see Daily Your Blog ,is A Very Useful For me.<br /><br /><a href="http://berlitz-jo.com/" rel="nofollow">learn german language in jordan</a><br /><br />Are you looking for best language school in Jordan? Berlitz Jordan offers a wide range of products and services for many different language including English, Arabic, German, Mandarin and many more languages.<br /><br />Visit Now - <a href="http://berlitz-jo.com/" rel="nofollow">http://berlitz-jo.com/</a>Anonymoushttps://www.blogger.com/profile/06389888791434608899noreply@blogger.comtag:blogger.com,1999:blog-2889416825250254881.post-675568952620921322016-12-01T09:31:46.585+00:002016-12-01T09:31:46.585+00:00Can anyone send me little code Gensim for Word2vec...Can anyone send me little code Gensim for Word2vec with small corpusAnonymoushttps://www.blogger.com/profile/11730242554153654820noreply@blogger.comtag:blogger.com,1999:blog-2889416825250254881.post-18277018144284024712016-12-01T09:30:25.048+00:002016-12-01T09:30:25.048+00:00I am interested to implement Word2vec for Urdu lan...I am interested to implement Word2vec for Urdu language. Can I use gensim, and how can use my corpus in iyAnonymoushttps://www.blogger.com/profile/11730242554153654820noreply@blogger.comtag:blogger.com,1999:blog-2889416825250254881.post-78171839790355206702016-05-31T21:51:15.199+01:002016-05-31T21:51:15.199+01:00I never had this issue since I worked with the tex...I never had this issue since I worked with the text extracted from web documents. I guess you can use word2vec itself to fix a lot of scan errors: use a dictionary and if it does not match, see which word it is mostly connected to. But the problem is these words do not have a lot of occurrence I suppose.<br /><br />I would consult the literature to look for methods, I am sure you are not the only one having this problem. But I am afraid I do not have any experience in this kind of problem.aliostadhttps://www.blogger.com/profile/05695786967974402749noreply@blogger.comtag:blogger.com,1999:blog-2889416825250254881.post-2693165867293678672016-05-31T17:18:38.384+01:002016-05-31T17:18:38.384+01:00Thx for your article,
I work with word2vec on the ...Thx for your article,<br />I work with word2vec on the 200 years corpus. In this curpus they are a lot of scan mistakes and the accuracy of the model is quite low.<br />You wrote about this problem just before the conclusion. Could you develop a little bit more. Did you fix this issue ?Anonymoushttps://www.blogger.com/profile/09474702770390677227noreply@blogger.comtag:blogger.com,1999:blog-2889416825250254881.post-69903553397812258422016-01-11T12:27:52.255+00:002016-01-11T12:27:52.255+00:00Well in fact the results pretty good. One problem ...Well in fact the results pretty good. One problem is that you would need to normalise the data, the casing throws it off. <br /><br />+ human - animal was in the Persian corpus I had which had many spiritual content. So I guess that was why.<br /><br />+ library - books: result is strange, what is traceview lodge anyway?!<br /><br />+ Obama + Russia - USA: Strange Medvedov is first but well, Putin is second.<br /><br />+ Iraq - violence: This works<br /><br />+ President - power: This also worked, if casing is normalised<br /><br />+ politics - lies: again, the corpus was Persian. But also interesting that comes up with partisan politics.<br /><br /><br /><br /><br />aliostadhttps://www.blogger.com/profile/05695786967974402749noreply@blogger.comtag:blogger.com,1999:blog-2889416825250254881.post-18777754286884736472016-01-11T10:36:02.392+00:002016-01-11T10:36:02.392+00:00word2vec is great, but none of your results corres...word2vec is great, but none of your results correspond to gensim results on the Google New corpus, except almost #4. Here are my top 3 results for your examples, and the code that generated it. I would add a test for "stock market" ~ "thermometer", but the "stock_market" token does not appear in the corpus.<br /><br />+ human - animal = mankind humankind humanity<br />+ library - books = Library Terraceview_Lodge rec_center<br />+ Obama + Russia - USA = Medvedev Putin Kremlin<br />+ Iraq - violence = Kuwait Iraqi Chalabi<br />+ President - power = president Vice_President Presdient<br />+ politics - lies = partisan_politics Politics political<br /><br />from gensim.models import Word2Vec<br />model = Word2Vec.load_word2vec_format("GoogleNews-vectors-negative300.bin", binary=True)<br />def similar(positive, negative):<br /> results = model.most_similar(positive=positive, negative=negative, topn=3)<br /> print ' '.join(['+ ' + x for x in positive] +<br /> ['- ' + x for x in negative] + ['='] +<br /> [result[0] for result in results])<br />similar(['human'], ['animal'])<br />similar(['library'], ['books'])<br />similar(['Obama', 'Russia'], ['USA'])<br />similar(['Iraq'], ['violence'])<br />similar(['President'], ['power'])<br />similar(['politics'], ['lies'])Anonymoushttps://www.blogger.com/profile/17427077588637523185noreply@blogger.comtag:blogger.com,1999:blog-2889416825250254881.post-77724795772839027662015-07-17T12:50:46.456+01:002015-07-17T12:50:46.456+01:00Dear Rot,
Excellent blog! I find your posts very ...Dear Rot,<br /><br />Excellent blog! I find your posts very interesting, especially the ones regarding Machine Learning. <br /><br />I am one of the executive editors at .NET Code Geeks (www.dotnetcodegeeks.com), a sister site to Java Code Geeks (www.javacodegeeks.com). We have the NCG program, a program that aims to build partnerships between .NET Code Geeks and community bloggers (see http://www.dotnetcodegeeks.com/join-us/ncg/), that I think you’d be perfect for.<br /><br />If you’re interested, send me an email to nikos[dot]souris[at]dotnetcodegeeks[dot]com and we can discuss further.<br /><br />Best regards,<br />Nikos SourisAnonymoushttps://www.blogger.com/profile/12997908024250013570noreply@blogger.comtag:blogger.com,1999:blog-2889416825250254881.post-86439432530618430362015-06-17T15:18:32.022+01:002015-06-17T15:18:32.022+01:00Thanks.Thanks.Unknownhttps://www.blogger.com/profile/16561604461063330032noreply@blogger.comtag:blogger.com,1999:blog-2889416825250254881.post-27251960069563015062015-06-17T14:59:25.898+01:002015-06-17T14:59:25.898+01:00Gensim does not care, you provide a vector of segm...Gensim does not care, you provide a vector of segmented words to it - rather a list or iterable to it.<br /><br />That happens at the segmentation phase. Basically at the segmentation, I first segment the document to sentences. Then I segment the sentence to words, which uses a dictionary of named entities and phrases which are to be treated as a single token. I believe this is very important but all examples I have seen use a simple tokenisation. So this way Barack Obama is a single token.<br />aliostadhttps://www.blogger.com/profile/05695786967974402749noreply@blogger.comtag:blogger.com,1999:blog-2889416825250254881.post-83544237532591642152015-06-17T14:49:51.036+01:002015-06-17T14:49:51.036+01:00How do you calculate vector for a bigram ?
Like y...How do you calculate vector for a bigram ? <br />Like you gave an example of a stock market . Did you average the vector for staock and market ?<br />I ask because gensim take only unigrams and so in the end I have vectors only for unigramsUnknownhttps://www.blogger.com/profile/16561604461063330032noreply@blogger.comtag:blogger.com,1999:blog-2889416825250254881.post-92008791359694317592015-06-15T04:26:33.597+01:002015-06-15T04:26:33.597+01:00Was it really necessary to be so hostile? You can ...Was it really necessary to be so hostile? You can get similar results from the word2vec websiteDavidhttps://www.blogger.com/profile/02046478502079301192noreply@blogger.comtag:blogger.com,1999:blog-2889416825250254881.post-12495178219194403612015-06-14T22:11:53.183+01:002015-06-14T22:11:53.183+01:00God damnit, "it's very easy", "...God damnit, "it's very easy", "nothing magic about it", "so simple"... cut the crap and release the source, preferably on GitHub. A .zip-file does the job also, for the case you don't know anything about git.Anonymoushttps://www.blogger.com/profile/09085248537138809861noreply@blogger.comtag:blogger.com,1999:blog-2889416825250254881.post-83792813706635624562015-06-14T20:38:02.912+01:002015-06-14T20:38:02.912+01:00There is nothing magic about the code I have writt...There is nothing magic about the code I have written. The word2vec part is around 10-20 lines of codes similar to what you find here https://radimrehurek.com/gensim/models/word2vec.html<br /><br />The key is to have a corpus. You can try some freely available corpora that are part of NLTK.aliostadhttps://www.blogger.com/profile/05695786967974402749noreply@blogger.comtag:blogger.com,1999:blog-2889416825250254881.post-49159415588226830972015-06-14T20:30:35.308+01:002015-06-14T20:30:35.308+01:00Is there a chance to find your code on github or s...Is there a chance to find your code on github or similar websites?Anonymoushttps://www.blogger.com/profile/05092613018831617483noreply@blogger.com