1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.
  2. We've had very few donations over the year. I'm going to be short soon as some personal things are keeping me from putting up the money. If you have something small to contribute it's greatly appreciated. Please put your screen name as well so that I can give you credit. Click here: Donations
    Dismiss Notice

The Complaining and Bitching Thread

Discussion in 'General Discussions' started by ASU2003, Jan 14, 2013.

  1. Lindy

    Lindy Moderator Staff Member

    Location:
    Nebraska
    I could never do that. By the time I'm finished fussing with my first draft, it's no longer a first draft.:rolleyes:
    Since the only editing I get is self editing; I would love to see your revision sheet. Would you be willing to share? Publicly or privately.
     
    • Like Like x 2
  2. Shadowex3

    Shadowex3 Very Tilted

    So every day more or less the US government gives a press release on the previous day's airstrikes against ISIS in Iraq and Syria including the rough area (by city) in which the strikes took place, how many strikes there were in that location, and what was destroyed. Every one of these strikes going back to December of 2014 has certain parts that are worded exactly the same... for example the date always follows the same header text and the cities/number of strikes always occurs between the words "Near" and "Airstrike(s)".

    Because of those static parts I know it's very simple to code a script to grab the date, city, and number of strikes using a regex and from there just put everything into a nice tidy dataset. The problem is I don't know enough actual raw coding to do that.

    It's aggravating to know something can be done, know what needs to be done, and just not know how to do it.
     
  3. martian

    martian Server Monkey Staff Member

    Location:
    Mars
    Perl, motherfucker. Do you speak it?

    If you provide me with a link to these reports I can probably whip something up, it sounds super simple. Y'know, for someone who knows Perl.
     
  4. redux

    redux Very Tilted

    Location:
    Foggy Bottom
    Pearl rocked! :)

    [​IMG]
     
    • Like Like x 3
  5. Shadowex3

    Shadowex3 Very Tilted

    No, I don't. I speak english, hebrew, american sign language, R, and I can swear in all of those plus finnish, arabic, spanish, and french. Which is why I'm trying to come up with a solution using regexes and R since I'd like to feel less bad about putting that on my resume. I got some advice on reddit including code to do exactly what I want with a single press release, I'm just working on extrapolating that to something that'll work properly on a massive facking textfile full of all the press releases from 2014 through the end of 2016 in reverse order. I've got a couple options for that. There's tweaking their code to work on the whole thing sequentially, wrapping their code in another script that pulls each single press release into an object before their code runs and then starts over at the next one, or just splitting up the press releases into a bunch of files or objects and running their code on each seperately.

    Bright side is R's designed from the ground up for exactly what I'm doing with it.
    It's also designed from the ground up to be an infuriating, painful, masochistic language to use. I've made compsci majors swear and shout "WHY" just by telling them it indexes from 1... and gets worse from there.

    Hadley Wickham would probably be sainted if the pope were a coder.
     
  6. martian

    martian Server Monkey Staff Member

    Location:
    Mars
    I mean, do what you want, but R is a statistical analysis engine and what you have on your hands is a string manipulation problem. String manipulation is Perl's wheelhouse, which is why I suggested it. You can do this with R, or any other Turing complete language of your choice. You can also slice bread with a band saw, but that doesn't mean it's a good idea.

    If you're after resume fodder, Python is better than either of those.
     
    • Like Like x 1
  7. Shadowex3

    Shadowex3 Very Tilted

    Python's something I want to learn but in the social sciences R's what everyone has a hardon for. I don't think it's as bad as you make it out to be because this kind of data scraping is one of the major things it's used for with packages like the tidyverse and full regex support.
     
  8. Wildmermaid

    Wildmermaid Very Tilted

    Location:
    Pacific Northwest
    migraine/pressure/bitch bitch/migraine/plan/migraine, yeah that covers it ~ I wish the monster in my brain would cease the rampage and just let me be for a bit.
     
  9. Cayvmann

    Cayvmann Very Tilted

    EdX.org has classes in both you can audit for free.
     
  10. Remixer

    Remixer Middle Eastern Doofus

    Location:
    Frankfurt, Germany
    It bothers me to no end that I had a more reliable internet connection in Afghanistan than the 1st world country that is Australia.

    The amount of infrastructure stupidity in this country is killing me.

    When used to stable 50 - 250 mbps internet for the past decade in 3 countries, living (and doing international work) here becomes suffering.

    Sigh. At least the weather is great and the overpriced drinks are still good.
     
  11. Shadowex3

    Shadowex3 Very Tilted

    So since that post I had a bit of a rollercoaster. Every press release had some static text before and after the date as well as the city and number of airstrikes, that combined with some R code someone on reddit wrote to answer my question let me cobble together a regex which would effectively grab everything I needed and process a press release into a dataframe. The downside was that method only worked on one press release at a time and couldn't handle matching dates with the subsequent airstrike listings.

    So I tried inverting the regex to delete everything but my matches, only there was no way to make that work. So I tried copying just the pieces I needed but kept getting the whole paragraph until a line break (ie the entire object in the vector). Finally someone on stackoverflow suggested a grep function that would let me pull out just the matches and not the full line and that got me to the point of having an object that looked like this:

    Code:
    Apriltember 34, 2094
    Near Bumfuck, Three 
    Near Nowhere, Five
    Near ISIStown, 17
    Apriltember 35th, 2094
    Near Tiz Al-Nubbe, Seven
    Near Habibi, Nine
    Near Ibn Sharmuta, Five
    From there I had another dilemma: getting each strike correlated with the preceeding date. I tried using the spreadsheet's IF function and a few others to make a function that checked if the leftward cell contained a date and either reproduced that or the date from the cell above it but always got an error. Then I realised there was MONTH, DAY, and YEAR as well as IFERROR. That worked perfectly since it either matched a date in the leftward cell or spat out an error, and in case of an error it gave me the cell above it (ie the previous date). From there it was easy to clean up the data by hand, split apart the cities and strike count (although I later found tidyverse had a function for that), and produce a three column data set: Date, City, Strike Count. I was at a point where I could start using ggplot2 to make various charts of the number of strikes per city, line charts of strikes over time, etc.

    The problem is that while that's a proper Tidy (ie Long) dataset it's not what the professor who assigned me the project wanted. He wanted every city as it's own column with the strike count for each date below it. So I kept trying to pivot the dataset and it wouldn't work, until I realised I needed another column just containing the row number that could serve as a unique identifier. The funny thing is I didn't realise that was what the issue was at first, so I had a moment where I finally got reformatting everything to work and immediately went "My code just worked and I have no idea why". Then I fucked up and overrote everything and had to do it again which is when I realised I needed the ID column.

    But the problem there was that I had one row per strike per city. So there were for example four rows for the 20th of July each with a single city's data in it instead of 1 row for July 20th with all four cities on it. And every time I tried to condense that down it threw an error. Which left me going "My code doesn't work and I have no idea why". Turns out the ID column was preventing me from condensing all rows in the Date column that had identical data. So first I needed it, then I needed to get rid of it.

    At that point all of us interns were realising we're being completely misused and wasted so I decided to push my luck and said "fuck it let's do everything". The professor wants to take the dataset and use ARCGIS to make a map of all the strikes over time. I made another spreadsheet with the cities and their latitude/longitude and started working on matching that up to my primary dataset, which had dozens of rows for every city. My first idea was an IF_ELSE statement, but I could never get it to work even with one city and I realised nesting fifty was one of those little signs you should refactor your code.

    So back to browsing stackoverflow it was. I was looking for examples of how to make a function that would use one dataframe as a lookup table for another. What I found was that it was literally a quarter of a single line of code to just merge the lookup table into the primary dataset and have it autopopulate two new columns based on matching the City column.

    Which brings me to where I am now. Leaflet was either very easy to figure out or I've just finally started to not be a complete idiot and it was super easy to make a map of all the strikes using either jittered single markers or auto-condensing numeric markers.

    Here's hoping they actually turn this internship into something decent instead of having me sit in a room and use google scholar to write a paper inbetween doing menial outsource-grade clerical work.
     
  12. DAKA

    DAKA DOING VERY NICELY, THANK YOU

    What are you guys talking about?
    ARCGIS, stack overflow, INFERROR.......
    As an Architect, retired for 27 years, of making lines on paper with pencils and using a slide rule and you speak a different language...
     
    • Like Like x 1
  13. Shadowex3

    Shadowex3 Very Tilted

    ARCGIS is a program for analyzing data on maps. Stackoverflow is a website like reddit but specifically for asking help with coding type problems. IFERROR is a function in Libreoffice Calc (a freeware spreadsheet program, like Excel) that performs a test and if the test is "true" does one thing or if it returns an error does another thing.

    If you really want your head to hurt everything I just described I did using stuff like this:

    It took me a week, a reddit post, and two posts on stackoverflow to figure out how to write those two lines of code.

    Starting at the left here's a slightly simplified explanation of what the first function on the first line does:

    Create an object named/or assign to the object named datesexcel all data pulled from the object named linesexcel that matches the following pattern: Anything which occurs after the exact string SOUTHWEST ASIA, and before the character - OR which occurs after the exact string Near and before the exact string airstrik.

    That's everything from "datesexcel" to the closing parenthesis of the function regexpr(). The rest of that is taking the output of regexpr() and using "%>%" to send that output to other functions that do more things. For example the parts that say "gsub" are deleting extra commas and bits I didn't need.


    After doing this I cleaned everything up by hand in a spreadsheet and then imported it back into Rstudio to play with some more. There's definitely a way to do everything I did by hand using more code, but some things like matching up data entries to the date based solely on the order they are in the file are much easier to do in Excel where something's physical position matters.
     
  14. DAKA

    DAKA DOING VERY NICELY, THANK YOU

    Shadowex3...Thank you for that "explanation"....<VBG> clear as mud.
    But, then again, "It don't matter to me" (lyrics from a song?...)
    Lately I can't handle the price checking on supermarket items, this from someone who "used to" be able to handle 3 or 4 multi million dollar construction projects at the same time...think about this, when I retired there were NO COMPUTERS in common use....(1990)
    Simpler times back then, but we didn't know it !!!
     
  15. rogue49

    rogue49 Tech Kung Fu Artist Staff Member

    Location:
    Baltimore/DC
    I want to do something
    But I don't...

    Things are a changin'
    I'm just going to hold on for the ride.

    Don't know where the curves are...
     
    • Like Like x 2
  16. Remixer

    Remixer Middle Eastern Doofus

    Location:
    Frankfurt, Germany
    Why... why would you approach someone to provide business & sales consulting to your company, if you already know what strategic changes to implement for sales and don't have a real desire to look into the management, operations or finance sides of the business?

    I literally don't even know how to respond to that. What the...
     
  17. Wildmermaid

    Wildmermaid Very Tilted

    Location:
    Pacific Northwest
    Sick sick and lower spine dislocated today. Fuckity fuck fuck. Hopefully tomorrow for gym and SWIMMING :D . PTSD is out of control today but that is okay as tomorrow will be better. Lots of tears but sometimes they really are cleansing.
     
  18. rogue49

    rogue49 Tech Kung Fu Artist Staff Member

    Location:
    Baltimore/DC
    I hope you feel better soon
     
    • Like Like x 1
  19. Wildmermaid

    Wildmermaid Very Tilted

    Location:
    Pacific Northwest
    Thank you so much! :)
     
  20. DAKA

    DAKA DOING VERY NICELY, THANK YOU

    Hey, Feel better...I enjoy the back and forth with you...
     
    • Like Like x 1